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Preface 


A time series is a set of ordered observations on a quantitative characteristic of a phenomenon at 
equally spaced time points. The goal of univariate time series analysis is to forecast values of a single 
historical series. The goal of multivariate time series analysis can be to model the relationships 
among component series as well as to forecast those components. 

Time series analysis can be accomplished most effectively by the SAS procedures ARIMA. 
STATESPACE, SPECTRA, and VARMAX. To use these procedures properly, you must (1) 
understand the statistics you need for the analysis and (2) know how to invoke the procedures. SAS 
for Forecasting Time Series, Second Edition, makes it easier for you to apply these procedures to 
your data analysis problems. 

Chapter 1, “Overview of Time Series,” reviews the goals and key characteristics of time series. The 
analysis methods available through SAS/ETS software are presented, beginning with the simpler 
procedures FORECAST, AUTOREG, and XI1 and continuing with the more powerful SPECTRA, 
ARIMA, and STATESPACE. This chapter shows the interrelationships among the various 
procedures. It ends with a discussion of linear regression, seasonality in regression, and regression 
with transformed data. 

Chapter 2, “Simple Models: Autoregression,” presents the statistical background necessary to model 
and forecast simple autoregressive (AR) processes. A three-part forecasting strategy is used with 
PROC ARIMA to identify, estimate, and forecast. The backshift notation is used to write a time 
series as a weighted sum of past shocks and to compute covariances through the Yule-Walker 
equations. The chapter ends with an example involving an AR process with regression techniques by 
overfitting. 

Chapter 3, “The General ARIMA Model,” extends the class of models to include moving averages 
and mixed ARMA models. Each model is introduced with its autocovariance function. Estimated 
autocovariances are used to determine a model to be fit, after which PROC ARIMA is used to fit the 
model, forecast future values, and provide forecast intervals. A section on time series identification 
defines the autocorrelation function, partial autocorrelation function, and inverse autocorrelation 
function. Newer identification techniques are also discussed. A catalog of examples is developed, 
and properties useful for associating different forms of these functions with the corresponding time 
series are described. This chapter includes the results of 150 observations generated from each of 
eight sample series. Stationarity and invertibility, nonstationarity, and differencing are discussed. 

Chapter 4, “The ARIMA Model: Introductory Applications,” describes the ARIMA model and its 
introductory applications. Seasonal modeling and model identification are explained, with Box and 
Jenkins’s popular airline data modeled. The chapter combines regression with time series errors to 
provide a richer class of forecasting models. Three cases are highlighted: Case 1 is a typical 
regression, case 2 is a simple transfer function, and case 3 is a general transfer function. 

New in Chapter 4 for the second edition are several interesting intervention examples involving 
analyses of 

□ the effect on calls of charging for directory assistance 

□ the effect on milk purchases of publicity about tainted milk 

□ the effect on airline stock volume of the September 11, 2001, terrorist attacks. 
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Chapter 5, “The ARIMA Model: Special Applications,” extends the regression with time series errors 
class of models to cases where the error variance can change over time—the ARCH and GARCH 
class. Multivariate models in which individual nonstationary series vary together over time are 
referred to as “cointegration” or “error correction” models. These are also discussed and illustrated. 
This chapter presents new developments since the first edition of the book. 

Chapter 6, “State Space Modeling,” uses the AR model to motivate the construction of the state 
vector. Next, the equivalence of state space and vector ARMA models is discussed. Examples of 
multivariate processes and their state space equations are shown. The STATESPACE procedure is 
outlined, and a section on canonical correlation analysis and Akaike’s information criterion is 
included. The chapter ends with the analysis of a bivariate series exhibiting feedback, a characteristic 
that cannot be handled with the general ARIMA transfer function approach. 

Chapter 7, “Spectral Analysis,” describes the SPECTRA procedure and how spectral analysis is used 
to detect sinusoidal components in time series models. In periodogram analysis, regressions are run 
on a sequence of values to find hidden periodicities. Spectra for different series, smoothing the 
periodogram, Fourier coefficients, and white noise tests are covered. The chapter ends with a 
discussion of cross-spectral analysis. New for the second edition is more in-depth discussion of tests 
for white noise and the ideas behind spectral analysis. 

Chapter 8, “Data Mining and Forecasting,” deals with the process of forecasting many time series 
with little intervention by the user. The goal of the chapter is to illustrate a modern automated 
interface for a collection of forecasting models, including many that have been discussed thus far. 
Chapter 8 also examines the SAS/ETS Time Series Forecasting System (TSFS), which provides a 
menu-driven interface to SAS/ETS and SAS/GRAPH procedures in order to facilitate quick and easy 
analysis of time series data. The chapter also includes a discussion detailing the use of PROC HPF, 
an automated high-performance forecasting procedure that is designed to forecast thousands of 
univariate time series. 
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1.1 Introduction 

This book deals with data collected at equally spaced points in time. The discussion begins with a 
single observation at each point. It continues with k series being observed at each point and then 
analyzed together in terms of their interrelationships. 

One of the main goals of univariate time series analysis is to forecast future values of the series. For 
multivariate series, relationships among component series, as well as forecasts of these components, 
may be of interest. Secondary goals are smoothing, interpolating, and modeling of the structure. 

Three important characteristics of time series are often encountered: seasonality, trend, and 
autocorrelation. 

Seasonality occurs, for example, when data are collected monthly and the value of the series in any 
given month is closely related to the value of the series in that same month in previous years. 
Seasonality can be very regular or can change slowly over a period of years. 

A trend is a regular, slowly evolving change in the series level. Changes that can be modeled by low- 
order polynomials or low-frequency sinusoids fit into this category. For example, if a plot of sales 
over time shows a steady increase of $500 per month, you may fit a linear trend to the sales data. A 
trend is a long-term movement in the series. 

In contrast, autocorrelation is a local phenomenon. When deviations from an overall trend tend to be 
followed by deviations of a like sign, the deviations are positively autocorrelated. Autocorrelation is 
the phenomenon that distinguishes time series from other branches of statistical analysis. 

For example, consider a manufacturing plant that produces computer parts. Normal production is 100 
units per day, although actual production varies from this mean of 100. Variation can be caused by 
machine failure, absenteeism, or incentives like bonuses or approaching deadlines. A machine may 
malfunction for several days, resulting in a run of low productivity. Similarly, an approaching 
deadline may increase production over several days. This is an example of positive autocorrelation, 
with data falling and staying below 100 for a few days, then rising above 100 and staying high for a 
while, then falling again, and so on. 

Another example of positive autocorrelation is the flow rate of a river. Consider variation around the 
seasonal level: you may see high flow rates for several days following rain and low flow rates for 
several days during dry periods. 
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Negative autocorrelation occurs less often than positive autocorrelation. An example is a worker's 
attempt to control temperature in a furnace. The autocorrelation pattern depends on the worker's 
habits, but suppose he reads a low value of a furnace temperature and turns up the heat too far and 
similarly turns it down too far when readings are high. If he reads and adjusts the temperature each 
minute, you can expect a low temperature reading to be followed by a high reading. As a second 
example, an athlete may follow a long workout day with a short workout day and vice versa. The 
time he spends exercising daily displays negative autocorrelation. 


1.2 Analysis Methods and SAS/ETS Software 


1.2.1 Options 

When you perform univariate time series analysis, you observe a single series over time. The goal is 
to model the historic series and then to use the model to forecast future values of the series. You can 
use some simple SAS/ETS software procedures to model low-order polynomial trends and 
autocorrelation. PROC FORECAST automatically fits an overall linear or quadratic trend with 
autoregressive (AR) error structure when you specify METHOD=STEPAR. As explained later, AR 
errors are not the most general types of errors that analysts study. For seasonal data you may want to 
fit a Winters exponentially smoothed trend-seasonal model with METHOD^ WINTERS. If the trend 
is local, you may prefer METHOD=EXPO, which uses exponential smoothing to fit a local linear or 
quadratic trend. For higher-order trends or for cases where the forecast variable Y ; is related to one or 
more explanatory variables X ; , PROC AUTOREG estimates this relationship and fits an AR series as 
an error term. 

Polynomials in time and seasonal indicator variables (see Section 1.3.2) can be computed as far into 
the future as desired. If the explanatory variable is a nondeterministic time series, however, actual 
future values are not available. PROC AUTOREG treats future values of the explanatory variable as 
known, so user-supplied forecasts of future values with PROC AUTOREG may give incorrect 
standard errors of forecast estimates. More sophisticated procedures like PROC STATESPACE, 
PROC V ARM AX, or PROC ARIMA, with their transfer function options, are preferable when the 
explanatory variable's future values are unknown. 

One approach to modeling seasonality in time series is the use of seasonal indicator variables in 
PROC AUTOREG to model a highly regular seasonality. Also, the AR error series from PROC 
AUTOREG or from PROC FORECAST with METHOD=STEPAR can include some correlation at 
seasonal lags (that is, it may relate the deviation from trend at time t to the deviation at time r 12 in 
monthly data). The WINTERS method of PROC FORECAST uses updating equations similar to 
exponential smoothing to fit a seasonal multiplicative model. 

Another approach to seasonality is to remove it from the series and to forecast the seasonally 
adjusted series with other seasonally adjusted series used as inputs, if desired. The U.S. Census 
Bureau has adjusted thousands of series with its X-l 1 seasonal adjustment package. This package is 
the result of years of work by census researchers and is the basis for the seasonally adjusted figures 
that the federal government reports. You can seasonally adjust your own data using PROC XI1, 
which is the census program set up as a SAS procedure. If you are using seasonally adjusted figures 
as explanatory variables, this procedure is useful. 
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An alternative to using X-11 is to model the seasonality as part of an ARIMA model or, if the 
seasonality is highly regular, to model it with indicator variables or trigonometric functions as 
explanatory variables. A final introductory point about the PROC XI1 program is that it identifies 
and adjusts for outliers. 

If you are unsure about the presence of seasonality, you can use PROC SPECTRA to check for it; 
this procedure decomposes a series into cyclical components of various periodicities. Monthly data 
with highly regular seasonality have a large ordinate at period 12 in the PROC SPECTRA output 
SAS data set. Other periodicities, like multiyear business cycles, may appear in this analysis. PROC 
SPECTRA also provides a check on model residuals to see if they exhibit cyclical patterns over time. 
Often these cyclical patterns are not found by other procedures. Thus, it is good practice to analyze 
residuals with this procedure. Finally, PROC SPECTRA relates an output time series Y ; to one or 
more input or explanatory series X ; in terms of cycles. Specifically, cross-spectral analysis estimates 
the change in amplitude and phase when a cyclical component of an input series is used to predict the 
corresponding component of an output series. This enables the analyst to separate long-term 
movements from short-term movements. 

Without a doubt, the most powerful and sophisticated methodology for forecasting univariate series 
is the ARIMA modeling methodology popularized by Box and Jenkins (1976). A flexible class of 
models is introduced, and one member of the class is fit to the historic data. Then the model is used 
to forecast the series. Seasonal data can be accommodated, and seasonality can be local; that is, 
seasonality for month t may be closely related to seasonality for this same month one or two years 
previously but less closely related to seasonality for this month several years previously. Local 
trending and even long-term upward or downward drifting in the data can be accommodated in 
ARIMA models through differencing. 

Explanatory time series as inputs to a transfer function model can also be accommodated. Future 
values of nondeterministic, independent input series can be forecast by PROC ARIMA, which, 
unlike the previously mentioned procedures, accounts for the fact that these inputs are forecast when 
you compute prediction error variances and prediction limits for forecasts. A relatively new 
procedure, PROC VARMAX, models vector processes with possible explanatory variables, the X in 
VARMAX. As in PROC STATESPACE, this approach assumes that at each time point you observe 
a vector of responses each entry of which depends on its own lagged values and lags of the other 
vector entries, but unlike STATESPACE, VARMAX also allows explanatory variables X as well as 
cointegration among the elements of the response vector. Cointegration is an idea that has become 
quite popular in recent econometrics. The idea is that each element of the response vector might be a 
nonstationary process, one that has no tendency to return to a mean or deterministic trend function, 
and yet one or more linear combinations of the responses are stationary, remaining near some 
constant. An analogy is two lifeboats adrift in a stormy sea but tied together by a rope. Their location 
might be expressible mathematically as a random walk with no tendency to return to a particular 
point. Over time the boats drift arbitrarily far from any particular location. Nevertheless, because 
they are tied together, the difference in their positions would never be too far from 0. Prices of two 
similar stocks might, over time, vary according to a random walk with no tendency to return to a 
given mean, and yet if they are indeed similar, their price difference may not get too far from 0. 


* Recently the Census Bureau has upgraded X-l 1, including an option to extend the series using ARIMA models prior to applying the centered 
filters used to deseasonalize the data. The resulting X-l2 is incorporated as PROC X12 in SAS software. 
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1.2.2 How SAS/ETS Software Procedures Interrelate 

PROC ARIMA emulates PROC AUTOREG if you choose not to model the inputs. ARIMA can also 
fit a richer error structure. Specifically, the error structure can be an autoregressive (AR). moving 
average (MA), or mixed-model structure. PROC ARIMA can emulate PROC FORECAST with 
METE10D=STEPAR if you use polynomial inputs and AR error specifications. However, unlike 
FORECAST, ARIMA provides test statistics for the model parameters and checks model adequacy. 
PROC ARIMA can emulate PROC FORECAST with METHOD=EXPO if you fit a moving average 
of order d to the c/th difference of the data. Instead of arbitrarily choosing a smoothing constant, as 
necessary in PROC FORECAST METHOD=EXPO, the data tell you what smoothing constant to use 
when you invoke PROC ARIMA. Furthermore, PROC ARIMA produces more reasonable forecast 
intervals. In short, PROC ARIMA does everything the simpler procedures do and does it better. 

However, to benefit from this additional flexibility and sophistication in software, you must have 
enough expertise and time to analyze the series. You must be able to identify and specify the form of 
the time series model using the autocorrelations, partial autocorrelations, inverse autocorrelations, 
and cross-correlations of the time series. Later chapters explain in detail what these terms mean and 
how to use them. Once you identify a model, fitting and forecasting are almost automatic. 

The identification process is more complicated when you use input series. For proper identification, 
the ARIMA methodology requires that inputs be independent of each other and that there be no 
feedback from the output series to the input series. For example, if the temperature T in a room at 
time t is to be explained by current and lagged furnace temperatures F ; , lack of feedback corresponds 
to there being no thermostat in the room. A thermostat causes the furnace temperature to adjust to 
recent room temperatures. These ARIMA restrictions may be unrealistic in many examples. You can 
use PROC STATESPACE and PROC VARMAX to model multiple time series without these 
restrictions. 

Although PROC STATESPACE and PROC VARMAX are sophisticated in theory, they are easy to 
run in their default mode. The theory allows you to model several time series together, accounting for 
relationships of individual component series with current and past values of the other series. 

Feedback and cross-correlated input series are allowed. Unlike PROC ARIMA, PROC 
STATESPACE uses an information criterion to select a model, thus eliminating the difficult 
identification process in PROC ARIMA. For example, you can put data on sales, advertising, 
unemployment rates, and interest rates into the procedure and automatically produce forecasts of 
these series. It is not necessary to intervene, but you must be certain that you have a property known 
as stationarity in your series to obtain theoretically valid results. The stationarity concept is discussed 
in Chapter 3, “The General ARIMA Model,” where you will learn how to make nonstationary series 
stationary. 

Although the automatic modeling in PROC STATESPACE sounds appealing, two papers in the 
Proceedings of the Ninth Annual SAS Users Group International Conference (one by Bailey and the 
other by Chavern) argue that you should use such automated procedures cautiously. Chavern gives an 
example in which PROC STATESPACE, in its default mode, fails to give as accurate a forecast as a 
certain vector autoregression. (However, the stationarity of the data is questionable, and stationarity 
is required to use PROC STATESPACE appropriately.) Bailey shows a PROC STATESPACE 
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forecast considerably better than its competitors in some time intervals but not in others. In SAS 
Views: SAS Applied Time Series Analysis and Forecasting, Brocklebank and Dickey generate data 
from a simple MA model and feed these data into PROC STATESPACE in the default mode. The 
dimension of the model is overestimated when 50 observations are used, but the procedure is 
successful for samples of 100 and 500 observations from this simple series. Thus, it is wise to 
consider intervening in the modeling procedure through PROC STATESPACE’s control options. If a 
transfer function model is appropriate, PROC ARIMA is a viable alternative. 

This chapter introduces some techniques for analyzing and forecasting time series and lists the SAS 
procedures for the appropriate computations. As you continue reading the rest of the book, you may 
want to refer back to this chapter to clarify the relationships among the various procedures. 

Figure 1.1 shows the interrelationships among the SAS/ETS software procedures mentioned. 

Table 1.1 lists some common questions and answers concerning the procedures. 


Figure 1.1 How SAS/ETS Software Procedures Interrelate 
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Table 1.1 Selected Questions and Answers Concerning SAS/ETS Software Procedures 

Questions 

1. Is a frequency domain analysis (F) or time domain analysis (T) conducted? 

2. Are forecasts automatically generated? 

3. Do predicted values have 95% confidence limits? 

4. Can you supply leading indicator variables or explanatory variables? 

5. Does the procedure run with little user intervention? 

6. Is minimal time series background required for implementation? 

7. Does the procedure handle series with embedded missing values? 

Answers 


SAS/ETS 

Procedures 1 2 3 4 5 6 7 


FORECAST 

T 

Y 

Y 

N' 

Y 

Y 

Y 

AUTOREG 

T 

Y* 

Y 

Y 

Y 

Y 

Y 

Xll 

T 

Y* 

N 

N 

Y 

Y 

N 

X12 

T 

Y* 

Y 

Y 

Y 

N 

Y 

SPECTRA 

F 

N 

N 

N 

Y 

N 

N 

ARIMA 

T 

Y* 

Y 

Y 

N 

N 

N 

STATESPACE 

T 

Y 

y* 

Y 

Y 

N 

N 

VARMAX 

T 

Y 

Y 

Y 

Y 

N 

N 

MODEL 

T 

Y* 

Y 

Y 

Y 

N 

Y 

Time Series 
Forecasting System 

T 

Y 

Y 

Y 

Y 

Y 

Y 


* = requires user intervention 
' = supplied by the program 
F = frequency domain analysis 


N = no 

T = time domain analysis 
Y = yes 


1.3 Simple Models: Regression 


1.3.1 Linear Regression 

This section introduces linear regression, an elementary but common method of mathematical 
modeling. Suppose that at time t you observe Y ( . You also observe explanatory variables X lr , X v and 
so on. For example, Y r could be sales in month t, X lr could be advertising expenditure in month f, and 
X, ( could be competitors' sales in month t. Output 1.1 shows a simple plot of monthly sales versus 
date. 
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A multiple linear regression model relating the variables is 

Y ; = Po + Pi X n +P 2 X 2r +s r 

For this model, assume that the errors e t 

• have the same variance at all times t 

• are uncorrelated with each other (s ( and s , are uncorrelated for t different from 5 ) 

• have a normal distribution. 


These assumptions allow you to use standard regression methodology, such as PROC REG or PROC 
GLM. For example, suppose you have 80 observations and you issue the following statements: 

TITLE "PREDICTING SALES USING ADVERTISING"; 

TITLE2 "EXPENDITURES AND COMPETITORS' SALES"; 

PROC REG DATA=SALES; 

MODEL SALES=ADV COMP / DW; 

OUTPUT OUT=OUT1 P=P R=R; 

RUN; 
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Output 1.2 shows the estimates of p n . p t . and p 2 O. The standard errors © are incorrect if the 
assumptions on s, are not satisfied. You have created an output data set called OUT1 and have 
called for the Durbin-Watson option to check on these error assumptions. 


Output 1.2 

Performing a 

Multiple 

Regression 


PREDICTING SALES USING ADVERTISING 
EXPENDITURES AND COMPETITORS' SALES 


The REG Procedure 
Model: M0DEL1 
Dependent Variable: SALES 

Analysis of Variance 




Sum of 

Mean 



Source 

DF 

Squares 

Square 

F Value 

Prob>F 

Model 

2 

2.5261822E13 

1 .2630911 El 3 

51.140 

0.0001 

Error 

77 

1.9018159E13 

246989077881 



C Total 

79 

4.427998E13 





Root MSE 496979.95722 R-square 

Dep Mean 3064722.70871 Adj R-sq 

C.V. 16.21615 


0.5705 

0.5593 


Parameter Estimates 


Variable 

DF 

© Parameter 
Estimate 

© Standard 
Error 

T for HO: 

Parameter=0 

Prob > |T| 

INTERCEP 

1 

2700165 

373957.39855 

7.221 

0.0001 

ADV 

1 

10.179675 

1.91704684 

5.310 

0.0001 

COMP 

1 

-0.605607 

0.08465433 

-7.154 

0.0001 



Durbin-Watson D 
(For Number of Obs.) 

1st Order Autocorrelation 

1.394 © 

80 

0.283 © 



The test statistics produced by PROC REG are designed specifically to detect departures from the 
null hypothesis (H n : uncorrelated) of the form 

H i : e, = pe M +e t 

where |p| < 1 and e t is an uncorrelated series. This type of error term, in which s ( is related to ,, 
called an AR (autoregressive) error of the first order. 
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The Durbin-Watson option in the MODEL statement produces the Durbin-Watson test statistic © 

d = K2 (e, - bJ 2 / S^s; 
where 

B r = Y t - Po - Pi X n - P; X 2f 

If the actual errors e t are uncorrelated, the numerator of d has an expected value of about 2(/?-1 )ct 2 
and the denominator has an expected value of approximately no 2 . Thus, if the errors e t are 
uncorrelated, the ratio d should be approximately 2. 

Positive autocorrelation means that e t is closer to e M than in the independent case, so | c, - c, , | 

should be smaller. It follows that d should also be smaller. The smallest possible value for d is 0. If d 
is significantly less than 2, positive autocorrelation is present. 

When is a Durbin-Watson statistic significant? The answer depends on the number of coefficients in 
the regression and on the number of observations. In this case, you have k=3 coefficients 
(P n , Pj, and p, for the intercept, ADV, and COMP) and n=80 observations. In general, if you want 
to test for positive autocorrelation at the 5% significance level, you must compare d= 1.349 to a 
critical value. Even with k and n fixed, the critical value can vary depending on actual values of the 
independent variables. The results of Durbin and Watson imply that if k =3 and n=80, the critical 
value must be between d L =1.59 and d u = 1.69. Since d is less than d L , you would reject the null 
hypotheses of uncorrelated errors in favor of the alternative: positive autocorrelation. If d> 2, which 
is evidence of negative autocorrelation, compute d'=A-d and compare the results to d L and d u 
Specifically, if d' (1.954) were greater than 1.69, you would be unable to reject the null hypothesis of 
uncorrelated errors. If d' were less than 1.59 you would reject the null hypothesis of uncorrelated 
errors in favor of the alternative: negative autocorrelation. Note that if 

1.59 <d< 1.69 

you cannot be sure whether d is to the left or right of the actual critical value c because you know 
only that 

1.59 <c< 1.69 

Durbin and Watson have constructed tables of bounds for the critical values. Most tables use k'=k- 1, 
which equals the number of explanatory variables, excluding the intercept and n (number of 
observations) to obtain the bounds d L and d v for any given regression (Draper and Smith 1998).* 

Three warnings apply to the Durbin-Watson test. First, it is designed to detect first-order AR errors. 
Although this type of autocorrelation is only one possibility, it seems to be the most common. The 
test has some power against other types of autocorrelation. Second, the Durbin-Watson bounds do 
not hold when lagged values of the dependent variable appear on the right side of the regression. 
Thus, if the example had used last month's sales to help explain this month's sales, you would not 
know correct bounds for the critical value. Third, if you incorrectly specify the model, the Durbin- 
Watson statistic often lies in the critical region even though no real autocorrelation is present. 
Suppose an important variable, such as X 3 ^product availability, had been omitted in the sales 
example. This omission could produce a significant d. Some practitioners use d as a lack-of-fit 
statistic, which is justified only if you assume a priori that a correctly specified model cannot have 
autocorrelated errors and, thus, that significance of d must be due to lack of fit. 


Exact p-values for d are now available in PROC AUTOREG as will be seen in Output 1.2 A later in this section. 
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The output also produced a first-order autocorrelation, © denoted as 
p = 0.283 

When n is large and the errors are uncorrelated, 

« ,,2 p/(i-pf 2 

is approximately distributed as a standard normal variate. Thus, a value 

« ,,2 p/(i-pf 2 

exceeding 1.645 is significant evidence of positive autocorrelation at the 5% significance level. This 
is especially helpful when the number of observations exceeds the largest in the Durbin-Watson 
table—for example, 

yjw (.283)/Vl - 0.283 2 = 2.639 

You should use this test only for large n values. It is subject to the three warnings given for the 
Durbin-Watson test. Because of the approximate nature of the n “p/I 1 - p“ I test, the Durbin- 
Watson test is preferable. In general, d is approximately 2(l - p). 

This is easily seen by noting that 

and 

d = ^(8, -8 w ) 2 /2]v 

Durbin and Watson also gave a computer-intensive way to compute exact p-values for their test 
statistic d. This has been incorporated in PROC AUTOREG. For the sales data, you issue this code to 
fit a model for sales as a function of this-period and last-period advertising. 

PROC AUTOREG DATA=NCSALES; 

MODEL SALES=ADV ADV1 / DWPROB; 

RUN; 

The resulting Output 1.2A shows a significant <7=5427 (p-value .0001 < .05). Could this be because 
of an omitted variable? Try the model with competitor’s sales included. 

PROC AUTOREG DATA=NCSALES; 

MODEL SALES=ADV ADV1 COMP / DWPROB; 

RUN; 


Now, in Output 1.2B, d =1.8728 is insignificant (p-value .2239 > .05). Note also the increase in 
R-square (the proportion of variation explained by the model) from 39% to 82%. What is the effect 
of an increase of $1 in advertising expenditure? It gives a sales increase estimated at $6.04 this 
period but a decrease of $5.18 next period. You wonder if the true coefficients on ADV and ADVI 
are the same with opposite signs; that is, you wonder if these coefficients add to 0. If they do, then 
the increase we get this period from advertising is followed by a decrease of equal magnitude next 



Chapter 1: Overview of Time Series 11 


period. This means our advertising dollar simply shifts the timing of sales rather than increasing the 
level of sales. Having no autocorrelation evident, you fit the model in PROC REG asking for a test 
that the coefficients of ADV and ADV1 add to 0. 

PROC REG DATA = SALES; 

MODEL SALES = ADV ADV1 COMP; 

TEMPR: TEST ADV+ADV1=0; 

RUN; 


Output 1.2C gives the results. Notice that the regression is exactly that given by PROC AUTOREG 
with no NLAG= specified. The p-value (.077>.05) is not small enough to reject the hypothesis that 
the coefficients are of equal magnitude, and thus it is possible that advertising just shifts the timing, a 
temporary effect. Note the label TEMPR on the test. 

Note also that, although we may have information on our company’s plans to advertise, we would 
likely not know what our competitor’s sales will be in future months, so at best we would have to 
substitute estimates of these future values in forecasting our sales. It appears that an increase of $1.00 
in our competitor’s sales is associated with a $0.56 decrease in our sales. 

From Output 1.2C the forecasting equation is seen to be 

PREDICTED SALES = 35967 - 0.563227COMP + 6.038203ADV - 5.188384ADV1 


Output 1.2A 

Predicting 
Sales frotn 
Advertising 


AUTOREG Procedure 


Dependent Variable = SALES 
Ordinary Least Squares Estimates 


SSE 

5.1646E9 

DFE 

77 

MSE 

67072080 

Root MSE 

8189.755 

SBC 

1678.821 

AIC 

1671.675 

Reg Rsq 

0.3866 

Total Rsq 

0.3866 

Durbin-Watson 

0.5427 

PR0B<DW 

0.0001 


Variable 

DF 

B Value 

Std Error 

t Ratio 

Approx Prob 

Intercept 

1 

14466 

8532.1 

1 .695 

0.0940 

ADV 

1 

6.560093 

0.9641 

6.804 

0.0001 

ADV1 

1 

-5.015231 

0.9606 

-5.221 

0.0001 
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Output 1.2B 

Predicting Sales 
from Advertising 
and 

Competitor’s 

Sales 


PREDICTING SALES USING ADVERTISING 
EXPENDITURES AND COMPETITOR'S SALES 

AUTOREG Procedure 


Dependent Variable = SALES 
Ordinary Least Squares Estimates 



SSE 

1.4877E9 

DFE 

76 



MSE 

19575255 

Root MSE 

4424.393 



SBC 

1583.637 

AIC 

1574.109 



Reg Rsq 

0.8233 

Total Rsq 

0.8233 



Durbin- 

Watson 1.8728 

PR0B<DW 

0.2239 


Variable 

DF 

B Value 

Std Error 

t Ratio Approx Prob 

Intercept 

1 

35967 

4869.0 

7.387 

0.0001 

COMP 

1 

-0.563227 

0.0411 

-13.705 

0.0001 

ADV 

1 

6.038203 

0.5222 

11.562 

0.0001 

AD VI 

1 

-5.188384 

0.5191 

-9.994 

0.0001 


Output 1.2C 

Predicting Sales 
from Advertising 
and 

Competitor’s 

Sales 


PREDICTING SALES USING ADVERTISING 
EXPENDITURES AND COMPETITOR'S SALES 

Dependent Variable: SALES 

Analysis of Variance 


Sum of 


Mean 




Source 


DF Squares Square 

F Value 

Prob>F 

Model 


3 6931264991. 

2 2310421663.7 

118.028 

0.0001 

Error 


76 1487719368. 

2 19575254.845 



C Total 


79 8418984359. 

4 



Root MSE 

4424.39316 R-square 0.8233 



Dep Mean 

29630.21250 Adj 

R-sq 0.8163 



C.V. 


14.93203 






Parameter Estimates 





Parameter 

Standard T for 

HO: 


Variable 

DF 

Estimate 

Error Parameter=0 

Prob > |T| 

INTERCEP 

1 

35967 4869.0048678 

7.387 

0.0001 

COMP 

1 

-0.563227 

0.04109605 

13.705 

0.0001 

ADV 

1 

6.038203 

0.52224284 

11.562 

0.0001 

ADV1 

1 

-5.188384 

0.51912574 

-9.994 

0.0001 

Durbin-Watson 

D 1 . 

873 



(For Number of 

Obs.) 

80 



1st Order 

Autocorrelation 0. 

044 




PREDICTING SALES USING ADVERTISING 
EXPENDITURES AND COMPETITOR'S SALES 

Dependent Variable: SALES 

Test: TEMPR Numerator:63103883.867 DF: 1 F value: 3.2237 

Denominator: 19575255 DF: 76 Prob>F: 0.0766 
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1.3.2 Highly Regular Seasonality 

Occasionally, a very regular seasonality occurs in a series, such as an average monthly temperature at 
a given location. In this case, you can model seasonality by computing means. Specifically, the mean 
of all the January observations estimates the seasonal level for January. Similar means are used for 
other months throughout the year. An alternative to computing the twelve means is to run a 
regression on monthly indicator variables. An indicator variable takes on values of 0 or 1. For the 
January indicator, the Is occur only for observations made in January. You can compute an indicator 
variable for each month and regress Y ; on the twelve indicators with no intercept. You can also 
regress Y r on a column of Is and eleven of the indicator variables. The intercept now estimates the 
level for the month associated with the omitted indicator, and the coefficient of any indicator column 
is added to the intercept to compute the seasonal level for that month. 

For further illustration. Output 1.3 shows a series of quarterly increases in North Carolina retail 
sales; that is, each point is the sales for that quarter minus the sales for the previous quarter. 

Output 1.4 shows a plot of the monthly sales through time. Quarterly sales were computed as 
averages of three consecutive months and are used here to make the presentation brief. A model for 
the monthly data will be shown in Chapter 4. Note that there is a strong seasonal pattern here and 
perhaps a mild trend over time. The change data are plotted in Output 1.6. To model the seasonality, 
use SI, S2, and S3, and for the trend, use time, Tl, and its square T2. The S variables are often 
referred to as indicator variables, being indicators of the season, or dummy variables. The first 
CHANGE value is missing because the sales data start in quarter 1 of 1983 so no increase can be 
computed for that quarter. 


Output 1.3 

Displaying 
North 
Carolina 
Retail Sales 
Data Set 


OBS 

DATE 

CHANGE 

SI 

S2 

S3 

Tl 

T2 

1 

83Q1 


1 

0 

0 

1 

1 

2 

83Q2 

1678.41 

0 

1 

0 

2 

4 

3 

83Q3 

633.24 

0 

0 

1 

3 

9 

4 

83Q4 

662.35 

0 

0 

0 

4 

16 

5 

84Q1 

-1283.59 

1 

0 

0 

5 

25 

(More 

Output 

Lines) 






47 

94Q3 

543.61 

0 

0 

1 

47 

2209 

48 

94Q4 

1526.95 

0 

0 

0 

48 

2304 
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Output 1.4 

Plotting 

North 

Carolina 

Monthly 

Sales 


NORTH CAROLINA RETAIL SALES IN MILLI ON $ 

MONTHLY DATA WITH YEAR MARKERS 
STARTING WITH JANUARY 1983 


SATES 



DATE 


Now issue these commands: 

PROC AUTOREG DATA=ALL; 

MODEL CHANGE = T1 T2 SI S2 S3 / DWPROB; 
RUN; 
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This gives Output 1.5. 


Output 1.5 

Using PROC 
AUTOREG 
to Get the 
Durbin- 
Watson Test 
Statistic 




AUTOREG 

Procedure 



Dependent Variable 

= CHANGE 






Ordinary Least 

Squares Estimates 


SSE 


5290128 

DFE 

41 


MSE 


129027.5 

Root MSE 

359.204 


SBC 


703.1478 

AIC 

692.0469 


Reg 

Rsq 

0.9221 

Total Rsq 

0.9221 


Durbin- 

Watson 2.3770 

PR0B<DW 

0.8608 


Variable 

DF 

B Value 

Std Error 

t Ratio Approx Prob 

Intercept 

1 

679.427278 

200.1 

3.395 

0.0015 

T1 

1 

-44.992888 

16.4428 

-2.736 

0.0091 

T2 

1 

0.991520 

0.3196 

3.102 

0.0035 

SI 

1 

-1725.832501 

150.3 

-11.480 

0.0001 

S2 

1 

1503.717849 

146.8 

10.240 

0.0001 

S3 

1 

-221.287056 

146.7 

-1.508 

0.1391 


PROC AUTOREG is intended for regression models with autoregressive errors. An example of a 
model with autoregressive errors is 


Y,= P 0 + Pj X u + P 2 X 2 , + 4 

where 

4 = P 4-i + U 

Note how the error term Z ; is related to a lagged value of itself in an equation that resembles a 
regression equation; hence the term “autoregressive.” The term e ; represents the portion of Z ( that 
could not have been predicted from previous Z values and is often called an unanticipated “shock” or 
“white noise.” It is assumed that the e series is independent and identically distributed. This one lag 
error model is fit using the /NAG=1 option in the MODEL statement. Alternatively, the options 
/NLAG=5 BACKSTEP can be used to try 5 lags of Z, automatically deleting those deemed 
statistically insignificant. 

Our retail sales change data require no autocorrelation adjustment. The Durbin-Watson test has a 
p-value 0.8608>0.05; so there is no evidence of autocorrelation in the errors. The fitting of the model 
is the same as in PROC REG because no NLAG specification was issued in the MODEL statement. 
The parameter estimates are interpreted just as they would be in PROC REG; that is, the predicted 
change PC in quarter 4 (where S1=S2=S3=0) is given by 

PC = 679.4-44.991 +0.991 2 
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and in quarter 1 (where S 1=1, S2=S3=0) is given by 
PC = 679.4 -1725.83 - 44.99 t + 0.99 t 2 

etc. Thus the coefficients of SI, S2, and S3 represent shifts in the quadratic polynomial associated 
with the first through third quarters and the remaining coefficients calibrate the quadratic function to 
the fourth quarter level. In Output 1.6 the data are dots, and the fourth quarter quadratic predicting 
function is the smooth curve. Vertical lines extend from the quadratic, indicating the seasonal shifts 
required for the other three quarters. The broken line gives the predictions. The last data point for 
1994Q4 is indicated with an extended vertical line. Notice that the shift for any quarter is the same 
every year. This is a property of the dummy variable model and may not be reasonable for some data; 
for example, sometimes seasonality is slowly changing over a period of years. 


Output 1.6 

Plotting 
Quarterly Sales 
Increase with 
Quadratic 
Predicting 
Function 


QUARTERLY SALES INCREASE 


change 



date 


To forecast into the future, extrapolate the linear and quadratic terms and the seasonal dummy 
variables the requisite number of periods. The data set extra listed in Output 1.7 contains such 
values. Notice that there is no question about the future values of these, unlike the case of 
competitor’s sales that was considered in an earlier example. The PROC AUTOREG technology 
assumes perfectly known future values of the explanatory variables. Set the response variable, 
CHANGE, to missing. 
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Output 1 .7 

Data 

Appended for 
Forecasting 


OBS 

DATE 

CHANGE 

SI 

S2 

S3 

T1 

T2 

1 

95Q1 


1 

0 

0 

49 

2401 

2 

95Q2 


0 

1 

0 

50 

2500 

3 

95Q3 


0 

0 

1 

51 

2601 

4 

95Q4 


0 

0 

0 

52 

2704 

5 

96Q1 


1 

0 

0 

53 

2809 

6 

96Q2 


0 

1 

0 

54 

2916 

7 

96Q3 


0 

0 

1 

55 

3025 

8 

96Q4 


0 

0 

0 

56 

3136 


Combine the original data set—call it NCSALES—with the data set EXTRA as follows: 
DATA ALL; 

SET NCSALES EXTRA; 

RUN; 


Now run PROC AUTOREG on the combined data, noting that the extra data cannot contribute to the 
estimation of the model parameters since CHANGE is missing. The extra data have full information 
on the explanatory variables and so predicted values (forecasts) will be produced. The predicted 
values P are output into a data set OUT1 using this statement in PROC AUTOREG: 

OUTPUT OUT=OUT1 PM=P; 

Using PM= requests that the predicted values be computed only from the regression function without 
forecasting the error term Z. If NLAG= is specified, a model is fit to the regression residuals and this 
model can be used to forecast residuals into the future. Replacing PM= with P= adds forecasts of 
future Z values to the forecast of the regression function. The two types of forecast, with and without 
forecasting the residuals, point out the fact that part of the predictability comes from the explanatory 
variables, and part comes from the autocorrelation—that is, from the momentum of the series. Thus, 
as seen in Output 1.5, there is a total R-square and a regression R-square, the latter measuring the 
predictability associated with the explanatory variables apart from contributions due to 
autocorrelation. Of course in the current example, with no autoregressive lags specified, these are the 
same and P= and PM= create the same variable. The predicted values from PROC AUTOREG using 
data set ALL are displayed in Output 1.8. 
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Output 1.8 

Plotting 
Quarterly Sales 
Increase with 
Prediction 


QUARTERLY SALES INCREASE 

change 



date 


Because this example shows no residual autocorrelation, analysis in PROC REG would be 
appropriate. Using the data set with the extended explanatory variables, add P and CL1 to produce 
predicted values and associated prediction intervals. 

PROC REG; 

MODEL CHANGE = T T2 SI S2 S3 / P CLI; 

TITLE "QUARTERLY SALES INCREASE"; 

RUN; 
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Output 1.9 

Producing 
Forecasts and 
Prediction 
Intervals with 
the P and CLI 
Options in the 
Model 
Statement 


QUARTERLY SALES INCREASE 


Dependent Variable: CHANGE 
Analysis of Variance 




Sum of 

Mean 



Source 


DF Squares 

Square 

F Value 

Prob>F 

Model 


5 62618900.984 

12523780.197 

97.063 

0.0001 

Error 


41 5290127.6025 

129027.5025 



C Total 


46 67909028.586 




Root 

MSE 

359.20398 

R-square 0 

9221 


Dep Mean 

280.25532 

Adj R-sq 0 

9126 


C.V. 


128.17026 






Parameter Estimates 





Parameter 

Standard T for HO: 


Variable 

DF 

Estimate 

Error Parameter=0 

Prob > |T| 

INTERCEP 

1 

679.427278 

200.12467417 

3.395 

0.0015 

T1 

1 

-44.992888 

16.44278429 

-2.736 

0.0091 

T2 

1 

0.991520 

0.31962710 

3.102 

0.0035 

SI 

1 

-1725.832501 

150.33120614 

-11.480 

0.0001 

S2 

1 

1503.717849 

146.84832151 

10.240 

0.0001 

S3 

1 

-221.287056 

146.69576462 

-1.508 

0.1391 



Quarterly Sales Increase 




Dep 

Var Predict 

Std Err Lower95% 

Upper95% 


Obs 

CHANGE Value 

Predict Predict 

Predict 

Residual 

1 


-1090.4 

195.006 -1915.8 

-265.0 


2 

1678.4 2097.1 

172.102 1292.7 

2901.5 

-418.7 

3 

633.2 332.1 

163.658 -465.1 

1129.3 

301 .2 

4 

662.4 515.3 

156.028 -275.6 

1306.2 

147.0 

5 

-1283.6 -1246.6 

153.619 -2035.6 

-457.6 

-37.0083 



(more output lines) 



49 


-870.4 

195.006 -1695.9 

-44.9848 


50 


2412.3 

200.125 1581.9 

3242.7 


51 


742.4 

211.967 -99.8696 

1584.8 


52 


1020.9 

224.417 165.5 

1876.2 


53 


-645.8 

251.473 -1531.4 

239.7 


54 


2644.8 

259.408 1750.0 

3539.6 


55 


982.9 

274.992 69.2774 

1896.5 


56 


1269.2 

291.006 335.6 

2202.8 



Sum of Residuals 0 
Sum of Squared Residuals 5290127.6025 
Predicted Resid SS (Press) 7067795.5909 


For observation 49 an increase in sales of -870.4 (i.e., a decrease) is predicted for the next quarter 
with confidence interval extending from -1695.9 to -44.98. This is the typical after-Christmas sales 
slump. 
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What does this sales change model say about the level of sales, and why were the levels of sales not 
used in the analysis? First, notice that a cubic term in time, bt 3 , when differenced becomes a quadratic 
term: bt - b(t-l) 3 = b(3t 2 - 3t + 1). Thus a quadratic plus seasonal model in the differences is 
associated with a cubic plus seasonal model in the levels. Flowever if the error term in the differences 
satisfies the usual regression assumptions, which it seems to do for these data, then the error term in 
the original levels can’t possibly satisfy them—the levels appear to have a nonstationary error term. 
Ordinary regression statistics are invalid on the original level series. If you ignore this, the usual 
(incorrect here) regression statistics indicate that a degree 8 polynomial is required to get a good fit. 

A plot of sales and the forecasts from polynomials of varying degree is shown in Output 1.10. The 
first thing to note is that the degree 8 polynomial, arrived at by inappropriate use of ordinary 
regression, gives a ridiculous forecast that extends vertically beyond the range of our graph just a few 
quarters into the future. The degree 3 polynomial seems to give a reasonable increase while the 
intermediate degree 6 polynomial actually forecasts a decrease. It is dangerous to forecast too far into 
the future using polynomials, especially those of high degree. Time series models specifically 
designed for nonstationary data will be discussed later. In summary, the differenced data seem to 
satisfy assumptions needed to justify regression. 


Output 1.10 

Plotting Sales 
and Forecasts 
of Polynomials 
of Varying 
Degree 


NORTH CAROLINA RETAIL SALES IN MILLIONS $ 
QUARTERLY STARTING IN 1983 
SYMBOL IS DEGREE OF FITTED POLYNOMIAL 

qsalcs 



1982Q1 1984Q1 1986Q1 1988Q1 1990Q1 1992Q1 1994Q1 1996Q1 1998Q1 

date 
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1.3.3 Regression with Transformed Data 

Often, you analyze some transformed version of the data rather than the original data. The 
logarithmic transformation is probably the most common and is the only transformation discussed in 
this book. Box and Cox (1964) suggest a family of transformations and a method of using the data to 
select one of them. This is discussed in the time series context in Box and Jenkins (1976, 1994). 

Consider the following model: 

Y,=P 0 (pf')^ 

Taking logarithms on both sides, you obtain 
log (Y t ) = log (p 0 ) + log (P,)X, + log (e,) 

Now if 

n, =log(\) 

and if r), satisfies the standard regression assumptions, the regression of log(Y ; ) on 1 and X r 
produces the best estimates of log(P n ) and log(Pj). 

As before, if the data consist of (X p Y t ), (X,, Yj, ..., (X n , Y n ), you can append future known values 
X +1 , X +2 , ..., X +j to the data if they are available. Set Y n+1 through Y m to missing values (.). Now use 
the MODEL statement in PROC REG: 

MODEL LY=X / P CLI; 

where 

LY=LOG(Y); 

is specified in the DATA step. This produces predictions of future LY values and prediction limits 
for them. If, for example, you obtain an interval 

-1.13 <log(YJ< 2.7 

you can compute 

exp(-1.13) = .323 

and 


exp(2.7) = 14.88 
to conclude 

.323 <Y <14.88 

n+s 

Note that the original prediction interval had to be computed on the log scale, the only scale on which 
you can justify a t distribution or normal distribution. 

When should you use logarithms? A quick check is to plot Y against X. When 

Yf = Po (pf‘) £ t 

the overall shape of the plot resembles that of 

Y = Po(pf) 
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See Output 1.11 for several examples of this type of plot. Note that the curvature in the plot 
becomes more dramatic as moves away from 1 in either direction; the actual points are scattered 

around the appropriate curve. Because the error term e is multiplied by p n (p,’j, the variation 
around the curve is greater at the higher points and lesser at the lower points on the curve. 


Output 1.11 

Plotting 

Exponential 

Curves 



Output 1.12 shows a plot of U.S. Treasury bill rates against time. The curvature and especially the 
variability displayed are similar to those just described. In this case, you simply have X =t. A plot of 
the logarithm of the rates appears in Output 1.13. Because this plot is straighter with more uniform 
variability, you decide to analyze the logarithms. 
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Output 1.12 

Plotting Ninety- 
Day Treasury 
Bill Rates 


Output 1.13 

Plotting Ninety- 
Day Logged 
Treasury Bill 
Rates 


CmBASE/CITIBANK ECONOMIC DATABASE 

90-DAY TREASURY BILLS 



DATE 


CmBASEyCITIBANK ECONOMIC DATABASE 

LOGGED 90 -DAY TREASURY BILLS 
3.0 


2.5 


2.0 


L5 


L0 


0.5 


0.0 

JAN60 JAN65 JAN70 JAN75 JAN80 JAN85 

DATE 
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To analyze and forecast the series with simple regression, you first create a data set with future 
values of time: 

DATA TBILLS2; 

SET TBILLS END=EOF; 

TIME+1; 

OUTPUT; 

IF EOF THEN DO 1=1 TO 24; 

LFYGM3=.; 

TIME+1; 

DATE=INTNX('MONTH',DATE,1); 

OUTPUT; 

END; 

DROP I; 

RUN; 

Output 1.14 shows the last 24 observations of the data set TBILLS2. You then regress the log T-bill 
rate, LFYGM3, on TIME to estimate log (p,,) and log(Pj) in the following model: 

LFYGM3 = log(p 0 ) + log(p,)*TIME + log (s,) 

You also produce predicted values and check for autocorrelation by using these SAS statements: 

PROC REG DATA=TBILLS2; 

MODEL LFYGM3=TIME / DW P CLI; 

ID DATE; 

TITLE 'CITIBASE/CITIBANK ECONOMIC DATABASE'; 

TITLE2 'REGRESSION WITH TRANSFORMED DATA'; 

RUN; 

The result is shown in Output 1.15. 


Output 1.14 

Displaying 
Future Date 
Values for 
U.S. Treasury 
Bill Data 


CITIBASE/CITIBANK ECONOMIC DATABASE 


OBS 

DATE LFYGM3 

TIME 

1 

N0V82 

251 

2 

DEC82 

252 

3 

JAN83 

253 

4 

FEB83 

254 

5 

MAR 8 3 

255 


(More Output Lines) 


20 

JUN84 

270 

21 

JUL84 

271 

22 

AUG84 

272 

23 

SEP84 

273 

24 

0CT84 

274 
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Output 1.15 Producing Predicted Values and Checking Autocorrelation with the P, CLI, and 
DW Options in the MODEL Statement 




CITIBASE/CITIBANK ECONOMIC DATABASE 





REGRESSION 

WITH TRANSFORMED DATA 





Dependent Variable: LFYGM3 






Analysis of Variance 






Sum of 

Mean 




Source 

DF 

Squares 

Square 

F Value 

Prob>F 


Model 

1 

32.68570 

32.68570 

540.633 

0.0001 


Error 

248 

14.99365 

0.06046 




C Total 

249 

47.67935 






Root 

MSE 

0.24588 

R-square 

0.6855 



Dep 

Mean 

1.74783 

Adj R 

-sq 

0.6843 



C.V. 


14.06788 








Parameter Estimates 





Parameter 

Standard 

T for HO: 



Variable DF 

Estimate 

Error 

Parameter=0 Prob > |T| 


INTERCEP 1 

1 

119038 0 

.03119550 

35 

.872 

0.0001 


TIME 1 

0 

005010 0 

.00021548 

23 

.252 

0.0001 




REGRESSION 

WITH TRANSFORMED DATA 




Dep Var 

Predict 

Std Err 

Lower95% 

Upper95% 


Obs 

DATE 

LFYGM3 

Value 

Predict 

Predict 

Predict 

Residual 

1 

JAN 62 

1 .0006 

1.1240 

0.031 

0.6359 

1.6122 

-0.1234 

2 

FEB62 

1 .0043 

1 .1291 

0.031 

0.6410 

1.6171 

-0.1248 

3 

MAR62 

1 .0006 

1.1341 

0.031 

0.6460 

1.6221 

-0.1334 

4 

APR62 

1 .0043 

1.1391 

0.030 

0.6511 

1.6271 

-0.1348 

5 

MAY62 

0.9858 

1.1441 

0.030 

0.6562 

1.6320 

-0.1583 

(More 

Output Lines 

) 






251 

N0V82 


2.3766 

0.031 

1 .8885 

2.8648 


(More 

Output Lines 

) 






270 

JUN84 


2.4718 

0.035 

1.9827 

2.9609 


271 

JUL84 


2.4768 

0.035 

1 .9877 

2.9660 


272 

AUG84 


2.4818 

0.035 

1.9926 

2.9711 


273 

SEP84 


2.4868 

0.035 

1 .9976 

2.9761 


274 

0CT84 


2.4919 

0.036 

2.0025 

2.9812 


Sum of 

Residuals 



0 




Sum of 

Squared Residuals 

14.9936 




Predicted Resid SS 

(Press 

15.2134 




DURBIN 

-WATSON D 


0.090 O 





(FOR NUMBER OF OBS 

■) 

250 © 





1ST ORDER AUTOCORRELATION 

0.951 © 
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Now, for example, you compute: 

1.119-(1.96)(0.0312) < log (p 0 ) < 1.119 + (l.96)(0.0312) 

Thus, 

2.880 <p 0 <3.255 

is a 95% confidence interval for p n . Similarly, you obtain 
1.0046 < Pj <1.0054 

which is a 95% confidence interval for p t . The growth rate of Treasury bills is estimated from this 
model to be between 0.46% and 0.54% per time period. Your forecast for November 1982 can be 
obtained from 

1.888 <2.377 <2.865 
so that 

6.61 <FYGM3 251 < 17.55 

is a 95% prediction interval for the November 1982 yield and 
exp(2.377) = 10.77 

is the predicted value. Because the distribution on the original levels is highly skewed, the prediction 
10.77 does not lie midway between 6.61 and 17.55, nor would you want it to do so. 

Note that the Durbin-Watson statistic O is <7=0.090. However, because 77=250 © is beyond the range 
of the Durbin-Watson tables, you use p = 0.951 © to compute 

n 1/2 p/(l-p 2 )' /2 =48.63 

which is greater than 1.645. At the 5% level, you can conclude that positive autocorrelation is present 
(or that your model is misspecified in some other way). This is also evident in the plot, in Output 
1.13, in which the data fluctuate around the overall trend in a clearly dependent fashion. Therefore, 
you should recompute your forecasts and confidence intervals using some of the methods in this 
book that consider autocorrelation. 

Suppose X=log(y) and X is normal with mean M x and variance cr“. Then y = exp(x) and y has 
median exp(M x ) and mean exp(M x + 54 cr“) For this reason, some authors suggest adding half the 

error variances to a log scale forecast prior to exponentiation. We prefer to simply exponentiate and 
think of the result, for example, exp(2.377) = 10.77, as an estimate of the median, reasoning that this 
is a more credible central estimate for such a highly skewed distribution. 
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2.1 Introduction 


2.1.1 Terminology and Notation 

Often you can forecast a series Y ; simply based on past values Y t | , Y t ,, ... . For example, suppose Y r 
satisfies 

Y, -F = p(y,_ 1 -p) + <?, (2.1) 

where e t is a sequence of uncorrelated N(o,cr) variables. The term for such an e t sequence is 
white noise. 

Assuming equation 2.1 holds at all times t, you can write, for example, 

Y f-i -P = p(y,_ 2 -p) 

and when you substitute in equation 2.1, you obtain 
Y t~Y = e t + P e t-i +P 2 ( Y r-2 -p) 

When you continue like this, you obtain 

Y, - p = e t + P e t-i + P~ e t-2 + • • • + ' e i + P f (Yi - p) (2.2) 

If you assume | p | < 1, the effect of the series values before you started collecting data (Y 0 , for 
example) is minimal. Furthermore, you see that the mean (expected value) of Y ; is p. 

Suppose the variance of Y t: l is ct 2 / (l - p 2 ). Then the variance of 

p( Y f-i ~v)+ e t 
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is 

p V/(l-p 2 )+a 2 =a 2 /(l-p 2 ) 

which shows that the variance of Y ; is also ct 2 / (l - p 2 ). 


2.1.2 Statistical Background 

You can define Y r as an accumulation of past shocks e t to the system by writing the mathematical 
model shown in equation 2.1, model 1, as 

Y,=p + Z“ =0 pVy (2-3) 

that is, by extending equation 2.2 back into the infinite past. This again shows that if | p | < 1 the 
effect of shocks in the past is minimal. Equation 2.3, in which the series is expressed in terms of a 
mean and past shocks, is often called the “Wold representation” of the series. You can also compute 
a covariance between Y ; and Y t j from equation 2.3. Calling this covariance 

y(j)= c ° v {y t , Y t _ j ) 

you have 

yO-) = pl4 2 /(l-p 2 )=pl 2 lvar(Y f ) 

An interesting feature is that y (j) does not depend on t. In other words, the covariance between Y r 
and Y t depends only on the time distance I t-s I between these observations and not on the values t 
and s. 

Why emphasize variances and covariances? They determine which model is appropriate for your 
data. One way to determine when model 1 is appropriate is to compute estimates of the covariances 
of your data and determine if they are of the given form—that is, if they decline exponentially at rate 
p as lag j increases. Suppose you observe this y(/j sequence: 

y(0) = 243, y(l) = 162, y(2) = 108, y(3) = 72, y(4) = 48, y(5) = 32, y(6) = 21.3,... 

You know the variance of your process, which is 

var(Y,) = y(o)=243 

and you note that 

y(i) / y(o) = 2/3 

Also, 

y(2)/y(l) = 2/3 
and, in fact, 

yO'VyO'- 1 ) = 2/3 

all the way through the sequence. Thus, you decide that model 1 is appropriate and that p = 2 / 3. 
Because 

y(°) = cr/(l-/r) 
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you also know that 

c 2 = (l - (2/3) 2 )(243) = 135 

2.2 Forecasting 

How does your knowledge of p help you forecast? Suppose you know p = 100 (in practice, you use 

an estimate like the mean, Y , of your observations). If you have data up to time n, you know that in 
the discussion above 

Y„ + 1 -100 = (2/3)(Y„-100) + c„ + 1 

At time n, e n+1 has not occurred and is not correlated with anything that has occurred up to time n. 
You forecast e n+1 by its unconditional mean 0. Because Y n is available, it is easy to compute the 
forecast of Y as 

n +1 

Y„ + 1 =100 + (2/3)(Y„-100) 

and the forecast error as 
Y - Y - p 

± n +1 x n +1 ^n +1 

Similarly, 

Y„ +2 -100 = (2/3)(y„ +1 -100) + c„ +2 

= (2/3)[(2/3)(Y„-100) + c„ +1 ] + c„ +2 

and you forecast Y n+2 as 
100 + (2/3) 2 (Y„ -100) 
with forecast error 

e„ + 2 +(2/3)c„ + 1 

Similarly, for a general p and p. a forecast L steps into the future is 

h + p L (Y„ - p) 
with error 

e n +L + P^n+L-l + • • • + P L le n +1 

A forecasting strategy now becomes clear. You do the following: 

1. Examine estimates of the autocovariances y ( j) to see if they decrease exponentially. 

2. If so, assume model 1 holds and estimate p and p. 

3. Calculate the prediction 
y„ +l =p + p l (y„-p) 
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and the forecast error variance 

a 2 (l + P 2 +p 4 + ... + p 2L - 2 ) 

You must substitute estimates, like p, for your parameters. 


For example, if pi = 100, p = 2/3, and Y n = 127, the forecasts become 118, 112, 108, 105.3, 103.6, 
102.4.The forecast error variances, based on 

var(e) = c 2 = 135 

become 135, 195, 221.7, 233.5, 238.8, and 241.1. The forecasts decrease exponentially at rate 
P = 2/3 to the series mean p = 100. The forecast error variance converges to the series variance 


c 


2 



135/ 


l-(2/3) 2 


243 = r{ o) 


This shows that an equation like equation 2.1 helps you forecast in the short run, but you may as well 
use the series mean to forecast a stationary series far into the future. 

In this section Y t - p = p (Y, , - p) +e t was expanded as an infinite sum of past shocks e t showing 
how past shocks accumulate to determine the current deviation of Y from the mean. At time n + L 
this expansion was 

Y h+L “ F = { e „+L +P e n+L-l +"' + P L le »+l}+P L { e n+P e n-\ +"'} 
which, substituting Y n - p = e n + p e n _ x H—, shows that 


1. The best (minimum prediction error variance) prediction of 

Y„ +l is P+P L (^, -P) 

2. The error in that prediction is |e„ +L +p e„ +L _i 3-b p L 1 e n , j so the prediction 

error variance is c 2 1^1 + p 2 + p 4 H-b p 2L ~ 2 J 

3. The effect of shocks that happened a long time ago has little effect on the present 
Y if \ P \ < 1. 


The future shocks (e’s) in item 2 have not yet occurred, but from the historic residuals an estimate of 
c 2 can be obtained so that the error variance can be estimated and a prediction interval calculated. It 
will be shown that for a whole class of models called ARMA models, such a decomposition of Y n+L 
into a prediction that is a function of current and past Y’s, plus a prediction error that is a linear 
combination of future shocks (e’s) is possible. The coefficients in these expressions are functions of 
the model parameters, likep, which can be estimated. 


2.2.1 Forecasting with PROC ARIMA 

As an example, 200 data values Y p Y 2 , ..., Y 100 with mean Y = 90.091 and last observation 
Y, 0O = 140.246 are analyzed with these statements: 

PROC ARIMA DATA=EXAMPLE; 

IDENTIFY VAR=Y CENTER; 

ESTIMATE P=1 NOCONSTANT; 

FORECAST LEAD=5; 

RUN; 
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Output 2.1 shows the results when you use PROC ARIMA to identify, estimate, and forecast. The 
CENTER option tells PROC ARIMA to use the series mean, Y, to estimate p. The estimates of 
yO) are called covariances O with j labeled LAG © on the printout. The covariances 1199, 956, 

709, 524, 402, 309, . . . decrease at a rate of about .8. Dividing each covariance by the variance 
1198.54 (covariance at lag 0) gives the estimated sequence of correlations.© (The correlation at lag 0 
is always p(o) = 1 and in general p(/j = y(/)/y(o).) The correlation plot © shows roughly exponential 

decay. The ESTIMATE statement produces an estimate © p = 0.80575 that you can test for 
significance with the t ratio.® Since t=\ 8.91 exceeds the 5% critical value, p is significant. Ifp were 
0, this t would have approximately a standard normal distribution in large samples. Thus a t 
exceeding 1.96 in magnitude would be considered significant at about the 5% level. Also, you have 
an estimate of a 2 , 430.7275 ©. You forecast Y, 01 by 

90.091 + .80575(140.246 - 90.091) = 130.503 

with forecast standard error 
(430.73)' 5 = 20.754 © 

Next, you forecast Y 202 by 

90.091 + (,80575) 2 (140.246 - 90.091) = 122.653 

with forecast standard error 

(430.73(1 + .80575 2 )) 5 = 26.6528 © 


Output 2.1 Using PROC ARIMA to Identify, Estimate, and Forecast 





The ARIMA Procedure 




Name of 

Variable 

Y 




Mean of 

Working Series 

0 




Standard Deviation 

34.61987 




Number 

of Observations 

200 





Autocorrelations 


© 

o 

© 


© 


Lag 

Covariance 

Correlation 

-1 9876543 

2101234567891 

Std Error 

0 

1198.535 

1 .00000 

1 


0 

1 

955.584 

0.79729 

1 


0.070711 

2 

708.551 

0.59118 


|************ | 

0.106568 

3 

524.036 

0.43723 


1********* 1 

0.121868 

4 

402.374 

0.33572 


I ****** * | 

0.129474 

5 

308.942 

0.25777 


1 * * * * * 1 

0.133755 

(More 

Output Lines) 





19 

-47.371450 

- .03952 

1 

* 1 1 

0.147668 

20 

-82.867591 

- .06914 

1 

* 1 1 

0.147720 

21 

-140.527 

- .11725 

1 

* * 1 1 

1 * 1 

0.147882 

22 

-113.545 

- .09474 

1 

* * 1 i 

1 * 1 

0.148346 

23 

-88.683505 

- .07399 

1 

* 1 | 

0.148648 

24 

-50.803423 

- .04239 

1 

* 1 1 

0.148832 



" . " 

marks two standard 

errors 
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Output 2.1 Using PROC ARIMA to Identify, Estimate, and Forecast (continued) 






Inverse 

Autocorrelations 






Lag 

Correlation 

-19 8 

765432101234567 

8 

9 

1 



1 


-0.57652 

1 

************ 1 



1 



2 


0.09622 

1 

1 * * 



1 



3 


0.02752 

1 

■ 1* ■ 



1 



4 


-0.07210 

1 

■ * 1 ■ 



1 



5 


0.04054 

1 

■ 1* ■ 



1 



(More 

Output Lines) 








19 


0.11277 

1 

1 * * 



1 



20 


-0.18424 

1 

* * * * 1 



1 



21 


0.20996 

1 

1 * * * * 



1 



22 


-0.12659 

1 

* * * 1 



1 



23 


0.04992 

1 

■ 1* ■ 



1 



24 


-0.01756 

1 




1 






Partial 

Autocorrelations 






Lag 

Correlation 

-19 8 

765432101234567 

8 

9 

1 



1 


0.79729 

1 

| **************** 


1 



2 


-0.12213 

1 

. * * | 



1 



3 


0.01587 

1 

■ 1 ■ 



1 



4 


0.03245 

1 

■ 1* ■ 



1 



5 


-0.00962 

1 

■ 1 ■ 



1 



(More 

Output Lines) 








19 


-0.06557 

1 

■ * 1 ■ 



1 



20 


0.01500 

1 




1 



21 


-0.10473 

1 

* * 1 



1 



22 


0.14816 

1 

1 * * * 



1 



23 


-0.03625 

1 

■ * 1 ■ 



1 



24 


0.03510 

1 

■ 1* ■ 



1 





Autocorrelation Check for White Noise 





To 

Chi- 


Pr > 







Lag 

Square 

DF 

ChiSq 


.Autocorrelations 

- 

- 



6 

287.25 

6 

<.0001 

0.797 

0.591 0.437 0.336 



0.258 

0.227 

12 

342.46 

12 

<.0001 

0.240 

0.261 0.243 0.198 



0.157 

0.111 

18 

345.17 

18 

<.0001 

0.071 

0.054 0.042 0.037 



0.036 

0.008 

24 

353.39 

24 

<.0001 

-0.040 

-0.069 -0.117 -0.095 



0.074 

-0.042 
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Output 2.1 Using PROC ARIMA to Identify, Estimate, and Forecast (continued) 






Conditional Least Squares 

Estimation 








© 

Standard 

© 

Approx 




Parameter 

Estimate 

Error t 

Value Pr 

> It 1 

Lag 



AR1 

1 


0.80575 

0.04261 

18.91 

<.0001 

1 






Variance Estimate 

430.7275 © 








Std 

Error Estimate 

20.75397 








AIC 


1781.668 








SBC 


1784.966 








Number of Residuals 

200 







* 

AIC and SBC do not include log determinant. 







Autocorrelation Check of 

Residuals 




To 

Chi- 



Pr > 






Lag 

Square 


DF 

ChiSq 


--Autocorrelations. 



6 

5.46 


5 

0.3623 

0.103 -0.051 

-0.074 

-0.020 

0.063 

-0.060 

12 

9.46 


11 

0.5791 

0.014 0.110 

0.074 

0.007 

0.034 

-0.002 

18 

11.30 


17 

0.8406 

-0.048 -0.002 

0.007 

-0.001 

0.065 

0.042 

24 

20.10 


23 

0.6359 

-0.042 0.043 

-0.185 

-0.006 

0.032 

-0.005 

30 

24.49 


29 

0.7043 

0.064 -0.007 

0.033 

0.028 

0.098 

-0.056 

36 

27.06 


35 

0.8290 

0.029 0.029 

0.074 

0.002 

0.036 

-0.046 






The ARIMA Procedure 









Model for variable 

Y 






Data 

have 

been centered by subtracting 

the value 

90.09064 







No 

mean term in this 

model. 








Autoregressive Factors 








Factor 1: 1 - 0.80575 

B** (1) 








Forecasts for variable Y 






Obs 


Forecast 

Std Error 

95% Confidence Limits 





201 


130.5036 

20.7540 © 

89.8265 

171 .1806 




202 


122.6533 

26.6528 

70.4149 

174.8918 





203 


116.3280 

29.8651 

57.7936 

174.8625 





204 


111 .2314 

31.7772 

48.9492 

173.5136 





205 


107.1248 

32.9593 

42.5257 

171.7239 




In the manner previously illustrated, PROC ARIMA produced the forecasts and standard errors.© 
The coefficients are estimated through the least squares (LS) method. This means that 

.80575 = E(Y f - Y)(Y f , - 7) / !(¥,_, - 7) 2 
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where Y is the mean of the data set and the sums run from 2 to 200. One alternative estimation 
scheme is the maximum-likelihood (ML) method and another is unconditional least squares (ULS). 
A discussion of these methods in the context of the autoregressive order 1 model follows. The 
likelihood function for a set of observations is simply their joint probability density viewed as a 

function of the parameters. The first observation Y, is normal with mean p and variance c 2 /(I - p 2 ). 
Its probability density function is 



(Y r p) 2 (i- P y 

2o 2 j 


For the rest of the observations, t=2,3,4,..., it is most convenient to note that e t =Y t - pY ( , has a 
normal distribution with mean p - pp = (1 - p)p and varianceo 2 . 

Each of these probability densities is thus given by 



2o 2 , 


Because Y,. t\.t\..... ty are independent, the joint likelihood is the product of these n probability 
density functions, namely 



(1 - P 2 )(Y t ~ F) 2 + KY 2 - p) - p(Y t - p)] 2 +... + [(Y„ - p) - pCY^ - p)] 2 ' 

2c 2 y 


Now substituting the observations for Y in the expression above produces an expression involving 
p ,p, and a 2 . Viewed in this way, the expression above is called the likelihood function for the data 
and clearly depends on assumptions about the model form. Using calculus, it can be shown that the 
estimate of a 2 that maximizes the likelihood is USS/n, where USS represents the unconditional sum 
of squares: 

USS = (l-p 2 )(Y -p) 2 +[(Y -p)-p(Y — p)] 2 + ... + [(Y -p)-p(Y -p)] 2 
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The estimates that minimize USS are the unconditional least squares (ULS) estimates—that is, USS 
is the objective function to be minimized by the ULS method. The minimization can be modified as 
in the current example by inserting Y in place of p, leaving only p to be estimated. 


The conditional least squares (CLS) method results from assuming that Y 0 and all other Ys that 
occurred before we started observing the series are equal to the mean. Thus it minimizes a slightly 
different objective function, 

[Y^ + KY^-pj-pCY -p)] 2 + ... +[(Y -p)-p (Y -p)] 2 

2 1 n n-1 

and, as with the other methods, it can be modified by inserting Y in place of p, leaving only p to be 
estimated. The first term cannot be changed by manipulating p, so the CLS method with Y inserted 
also minimizes 

[(Y -Y)-p (Y -Y)] 2 +...+[(Y -Y)-p (Y -Y)] 2 

2 1 n n-1 

In other words the CLS estimate of p could be obtained by regressing deviations from the sample 
mean on their lags with no intercept in this simple centered case. 

If full maximum likelihood estimation is desired, the expression USS//7 is substituted for c 2 in the 
likelihood function and the resulting expression, called a concentrated likelihood, is maximized. The 
log of the likelihood is 

-(«/ 2) log(27i /«) - («/ 2) - («/ 2) log(US S) + (1 / 2) log(l - p 2 ) 

The ML method can be run on centered data by inserting Y t -Y in USS in place of Y t - p. 

For the series 14 15 14 10 12 105668 the sample average is 10. The three rows in Output 2.2 
display the objective functions just discussed for conditional least squares, unconditional least 
squares, and maximum likelihood for an autoregressive order 1 model fit to these data. The negative 
of the likelihood is shown so that a minimum is sought in each case. The right panel in each row 
plots the function to be minimized over a floor of (p, p) pairs, with each function truncated by a 
convenient ceiling plane. Crosshairs in the plot floors indicate the minimizing values, and it is seen 
that these estimates can vary somewhat from method to method when the sample size is very small. 
Each plot also shows a vertical slicing plane at p=10, corresponding to the sample mean. The left 

plots show the cross section from the slicing planes. These then are the objective functions to be 
minimized when the sample mean, 10, is used as an estimate of the population mean. The slicing 
plane does not meet the floor at the crosshair mark, so the sample mean differs somewhat from the 
estimate that minimizes the objective function. Likewise the p that minimizes the cross section plot is 

not the same as the one minimizing the surface plot, although this difference is quite minor for ULS 
and ML in this small example. 
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Output 2.2 Objective Functions 



The minimizing values for the right-side ULS plot are obtained from the code 

PROC ARIMA DATA=ESTIMATE; 

IDENTIFY VAR=Y NOPRINT; 

ESTIMATE P=1 METHOD = ULS OUTEST=OUTULS PRINTALL; 
RUN; 
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with METHOD=ML for maximum likelihood and no method specification for CLS. 

The OUTEST data set holds the estimates and related information and PRINT ALL shows the 
iterative steps used to search for the minima. The use of the CENTER option in the IDENTIFY 
statement along with the NOCONSTANT option in the ESTIMATE statement will produce the p 

estimate that minimizes the objective function computed with the sample mean (10). A partial output 
showing the iterations for our small series is shown in Output 2.3. The second column in each 
segment is the objective function that is being minimized and should end with the height of the 
lowest point in each plot. The estimates correspond to the coordinate(s) on the horizontal axis (or the 
floor) corresponding to the minimum. 


Output 2.3 

Using PROC 
ARIMA to Get 
Iterations for 
Parameter 
Estimates 




Conditional Least Squares Estimation 


Iteration 

SSE 

MU 

AR1,1 Constant 

Lambda 

R Crit 


0 

62.26767 

10.0000 

0.6885 3.114754 

0.00001 

1 


1 

58.98312 

11 .2357 

0.7053 3.310709 

IE-6 

0.216536 


2 

57.57647 

11.4867 

0.7957 2.347267 

IE-7 

0.1318 


3 

56.72048 

12.1199 

0.8211 2.168455 

IE-8 

0.0988 


4 

56.17147 

12.4242 

0.8643 1.686007 

IE-9 

0.074987 


5 

55.81877 

12.8141 

0.8858 1 .463231 

IE-10 

0.060663 


6 

55.62370 

13.0412 

0.9073 1 .209559 

IE-11 

0.045096 


7 

55.52790 

13.2380 

0.9191 1.070524 

IE-12 

0.03213 


8 

55.48638 

13.3505 

0.9282 0.959077 

IE-12 

0.021304 


9 

55.46978 

13.4302 

0.9332 0.897121 

IE-12 

0.013585 


10 

55.46351 

13.4751 

0.9366 0.854795 

IE-12 

0.008377 


11 

55.46123 

13.5040 

0.9385 0.831106 

IE-12 

0.005073 


12 

55.46041 

13.5205 

0.9396 0.816055 

IE-12 

0.003033 


13 

55.46013 

13.5306 

0.9403 0.807533 

IE-12 

0.001801 


14 

55.46003 

13.5364 

0.9407 0.802304 

IE-12 

0.001065 


15 

55.45999 

13.5399 

0.9410 0.79931 

IE-12 

0.000628 



Conditional Least Squares Estimation 


Iteration 


SSE 

MU AR1,1 Constant 

Lambda 

R Crit 

0 


62.26767 10 

.0000 0 

.6885 3.114754 0 

.00001 

1 

1 


58.98312 11 

.2357 0 

.7053 3.310709 

IE-6 

0.216536 

2 


57.57647 11 

.4867 0 

.7957 2.347267 

IE-7 

0.1318 

3 


56.72048 12 

.1199 0 

.8211 2.168455 

IE-8 

0.0988 

4 


56.17147 12 

.4242 0 

.8643 1.686007 

IE-9 

0.074987 
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Output 2.3 

Using PROC 
ARIMA to Get 
Iterations for 
Parameter 
Estimates 
(continued) 


Unconditional Least Squares Estimation 


Iteration 

SSE 

MU 

AR1 ,1 

Constant 

Lambda 

R Crit 

0 

54.31645 

12.4242 

0.8643 

1.686007 

0.00001 

1 

1 

52.98938 

10.6007 

0.8771 

1 .302965 

IE-6 

0.165164 

2 

52.77450 

10.8691 

0.8370 

1 .771771 

IE-7 

0.065079 

3 

52.70017 

10.5107 

0.8357 

1 .726955 

IE-8 

0.036672 

4 

52.68643 

10.5479 

0.8262 

1 .833486 

IE-9 

0.01526 

5 

52.68382 

10.4905 

0.8254 

1.831752 

IE-10 

0.006528 

6 

52.68330 

10.4928 

0.8237 

1 .849434 

IE-11 

0.00267 

7 

52.68321 

10.4841 

0.8235 

1 .850351 

IE-12 

0.001104 

8 

52.68319 

10.4838 

0.8232 

1 .853158 

IE-12 

0.000458 


Conditional Least Squares Estimation 


Iteration 

SSE 

MU 

AR1 ,1 

Constant 

Lambda 

R Crit 

0 

62.26767 

10.0000 

0.6885 

3.114754 

0.00001 

1 

1 

58.98312 

11.2357 

0.7053 

3.310709 

IE-6 

0.216536 

2 

57.57647 

11 .4867 

0.7957 

2.347267 

IE-7 

0.1318 

3 

56.72048 

12.1199 

0.8211 

2.168455 

IE-8 

0.0988 

4 

56.17147 

12.4242 

0.8643 

1 .686007 

IE-9 

0.074987 


Maximum Likelihood Estimation 



Iter 

Loglike 

MU 

AR1 ,1 

Constant 

Lambda 

R Crit 

0 

-23.33779 

12.4242 

0.8643 

1.686007 

0.00001 

1 

1 

-22.97496 

10.3212 

0.7696 

2.378179 

IE-6 

0.233964 

2 

-22.96465 

10.5093 

0.7328 

2.808362 

IE-7 

0.058455 

3 

-22.96211 

10.3352 

0.7438 

2.647467 

IE-8 

0.028078 

4 

-22.96176 

10.3827 

0.7374 

2.726623 

IE-9 

0.010932 

5 

-22.96169 

10.3548 

0.7397 

2.69579 

IE-10 

0.004795 

6 

-22.96168 

10.3648 

0.7385 

2.709918 

IE-11 

0.002018 

7 

-22.96168 

10.3600 

0.7390 

2.70409 

IE-12 

0.00087 


Conditional Least Squares Estimation 



Iteration 

SSE 

AR1 

, 1 Lambda 

R Crit 


0 

62.26767 

0.6885 0.00001 


1 


1 

62.20339 

0.7119 IE-6 

0.03213 


2 

62.20339 

0.7119 IE-7 

3.213E- 

■7 


Conditional Least Squares Estimation 



Iteration 

SSE 

AR1 

, 1 Lambda 

R Crit 


0 

62.26767 

0.6885 0.00001 


1 


1 

62.20339 

0.7119 IE-6 

0.03213 


2 

62.20339 

0.7119 IE-7 

3.213E- 

■7 
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Output 2.3 

Using PROC 
ARIMA to Get 
Iterations for 
Parameter 
Estimates 
(continued) 


Unconditional Least Squares Estimation 

Iteration 

SSE 

AR1 ,1 

Lambda 

R Crit 

0 

54.09537 

0.7119 

0.00001 

1 

1 

52.89708 

0.7967 

IE-6 

0.133727 

2 

52.82994 

0.8156 

IE-7 

0.031428 

3 

52.82410 

0.8212 

IE-8 

0.009357 

4 

52.82357 

0.8229 

IE-9 

0.002985 

5 

52.82353 

0.8235 

IE-10 

0.000973 

Conditional Least 

Squares 

Estimation 

Iteration 

SSE 

AR1 ,1 

Lambda 

R Crit 

0 

62.26767 

0.6885 

0.00001 

1 

1 

62.20339 

0.7119 

IE-6 

0.03213 

2 

62.20339 

0.7119 

IE-7 

3.213E-7 


Maximum Likelihood Estimation 


Iter 

Loglike 

AR1 ,1 

Lambda 

R Crit 

0 

-22.98357 

0.7119 

0.00001 

1 

1 

-22.97472 

0.7389 

IE-6 

0.042347 

2 

-22.97471 

0.7383 

IE-7 

0.001059 

3 

-22.97471 

0.7383 

IE-8 

0.000027 


Notice that each method begins with conditional least squares starting with the sample mean and an 
estimate, .6885, of the autoregressive coefficient. The CLS estimates, after a few iterations, are 
substituted in the ULS or ML objective function when one of those methods is specified. In more 
complex models, the likelihood function is more involved, as are the other objective functions. 
Nevertheless the basic ideas presented here generalize nicely to all models handled by PROC 
ARIMA. 

You have no reason to believe that dependence of Y ; on past values should be limited to the previous 
observation Y t | . For example, you may have 

% - F = oq (Y M - p) + a 2 (%_ 2 -\x) + e t (2.4) 

which is a second-order autoregressive (AR) process. One way to determine if you have this process 
is to examine the autocorrelation plot by using the following SAS statements: 

PROC ARIMA DATA=ESTIMATE; 

IDENTIFY VAR=Y; 

RUN; 


You thus need to study the form of autocorrelations for such AR processes, which is facilitated by 
writing the models in backshift notation. 
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2.2.2 Backshift Notation B for Time Series 

A convenient notation for time series is the backshift notation B where 

b(y,) = y, , 

That is, B indicates a shifting back of the time subscript. Similarly, 

b 2 (y ; )=b(y ; ,) = y, 2 

and 

b 5 (y,) = y,_ 5 

Now consider the process 
Y,=.8 Y m + c, 

In backshift notation this becomes 
(l-.8B)Y,=e, 

You can write 

Yf = (l - -SB) -1 e t 
and, recalling that 

(l-X) -1 =1 + X + X 2 +x 3 + ... 
for | X | < 1, you obtain 

Y t = (l + ,8B + ,8 2 B 2 + ,8 3 B 3 + .. )e t 
or 

Y t =e t + .8e M + ,6Ae t _ 2 + ... 

It becomes apparent that the backshift allows you to execute the computations, linking equations 2.1 
and 2.3 in a simplified manner. This technique extends to higher-order processes. For example, let 

Y t = 1.70Y m - ,12Y t _ 2 + e t (2.5) 

Comparing equations 2.5 and 2.4 results in p = 0, a x = 1.70, and a n = -.72. You can rewrite 
equation 2.5 as 

(l-1.70B+.72B 2 )Y r =e t 

or as 

Y ( = (l - 1.70B + .72B 2 j * e t (2.6) 

Algebraic combination shows that 

9 / (l - .9B) - 8 / (l - .8B) = 1 / (l - 1.70B + .72B 2 ) 

Thus, you can write Y t as 

Y t =^: 0 W J e t _ J 
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where 

Wj = 9 (. 9 ; )- 8 (. 8 ; ) 

You can see that the influence of early shocks e is minimal because .9 and .8 are less than 1. 
Equation 2.6 allows you to write Y ; as 

Y t =e t + \.le t _ x + 2.11 e t _ 2 + 2Ale t _ 3 + 2.63e t _ 4 
+ 2.69e + 2.69£y_g + 2.G3e^_-j + 2.53e^_^ + ... 

which you can also accomplish by repeated back substitution as in equation 2.3. Note that the 
weights W. initially increase before tapering off toward 0. 


2.2.3 Yule-Walker Equations for Covariances 

You have learned how to use backshift notation to write a time series as a weighted sum of past 
shocks (as in equation 2.7); you are now ready to compute covariances y(/j . You accomplish this by 
using the Yule-Walker equations. These equations result from multiplying the time series equation, 
such as equation 2.5, by Y and computing expected values. 

For equation 2.5, when you use j- 0, you obtain 
E(Y“) = 1.70E (Y, Y,_!) - .72E(Y r Y r _ 2 ) + E(Y^) 
or 

y(0) = 1.70y(l) - ,72y(2) + cr 

where E stands for expected value. Using equation 2.7 with all subscripts lagged by 1, you see that 
Y ( / involves only e tI , e t2 , ... . Thus, 

E(Y w cJ = 0 

When you use j- 1, you obtain 

E(Y,Y M ) = 1.70E(Y M 2 )- .72E(Y m Y,_ 2 ) + E(Y M e,) 

Furthermore, 

E(Y w Y r _ 2 ) = y(l) 

because the difference in subscripts is 
(,-l)_(,-2) = l 
Also recall that 
y(l) = y(-l) 

Using these ideas, write your second Yule-Walker equation as 
y(l) = 1.70y(0)-.72y(l) 

In the same manner, for all j> 0, 
yO) = l-70y(7-l)-.72yO-2) 


( 2 . 8 ) 
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If you assume a value for c 2 (for example, c 2 = 10), you can use the Yule-Walker equations to 
compute autocovariances y (/) and autocorrelations 

p(j) = y(jVy(o) 

The autocorrelations do not depend on a 2 . The Yule-Walker equations for j—0, j= 1, and j—2 are three 
equations in three unknowns: y(0), y (l), and y(2). Solving these (using c 2 = 10), you get 
y (0) = 898.1, y (l) = 887.6 , and y (2) = 862.4. Using equation 2.8, you then compute 

y(3) = 1.7(862.4) - .72(887.6) = 827.0 

and 

y(4) = 1.7(827.0) - .72(862.4) = 785 
and so forth. 

Thus, the Yule-Walker equations for a second-order AR process (see equation 2.4) are 
y(0) = cqy(l) + a 2 y(2) + a 2 

and 

j(j) = a 1 y(y-l) + a 2 y(y-2), j> 0 

You have also seen that PROC ARIMA gives estimates of y (/). With that in mind, suppose you 
have a time series with mean 100 and the covariance sequence as follows: 

y(j) 


The last two observations are 130 and 132, and you want to predict five steps ahead. How do you do 
it? First, you need a model for the data. You can eliminate the first-order AR model 1 based on the 
failure of y (j) to damp out at a constant exponential rate. For example, 

y(l) / y(o) = -92 

but 

y(2)/y(l) = .77 

If the model is a second-order autoregression like equation 2.4, you have the Yule-Walker equations 

390 = oq (360) + a 2 (277.5) + a 2 
360 = oq (390) + a 2 (360) 

and 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

390 

360 

277.5 

157.5 

19.9 

- 113.8 

- 223.7 

- 294.5 

- 317.6 

- 292.2 

- 223.9 

- 125.5 

- 13.2 

95.5 

184.4 


277.5 = a 1 (360) + a 2 (390) 
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These can be solved with a ] = 1.80, a n = -.95, and c 2 = 5.625 . Thus, in general, if you know or 
can estimate the y (y)s , you can find or estimate the coefficients from the Yule-Walker equations. 
You can confirm this diagnosis by checking to see that 

y(y) = 1.80y(y - l) - ,95y(y - 2) 

for j- 3, 4, ..., 14. 

To predict, you first write your equation 

Y, - 100 = 1.80(Y M - 100) - .95(Y r _ 2 - 100) + e t (2.9) 

Assuming your last observation is Y n , you now write 
Y„ +1 = 100 + 1.80(Y„ - 100) - ,95(Y„_ 1 - 100) + e n+l 

and 

Y„ +1 = 100 + 1.80(l32 - 100) - .95(130 - 100) = 129.1 
where you recall that 130 and 132 were the last two observations. The prediction error is 

Y - Y -p 

n +1 n +1 c n +1 

with variance a 2 = 5.625 , and you compute the one-step-ahead prediction interval from 
129.1-1.96(5.625) 5 
to 

129.1+ 1.96(5.625) 5 
The prediction of Y i+2 arises from 

Y„ +2 = 100 + 1.80(Y„ +1 - 100) - ,95(Y„ - 100) + e„ +2 
and is given by 

Y„ +2 = 100 + 1.80(y„ + 1 - 100)- ,95(Y„ - 100) = 122 
The prediction error is 

1- 8 ( Y »+1 _ Y »+l) + e n+ 2 = l' 8e n+l + e n+2 

with variance 

c 2 (l + 1.8 2 ) = 23.85 

Using equation 2.9, you compute predictions, replacing unknown Y with predictions and e n+j with 0 
for j> 0. You also can monitor prediction error variances. If you express Y ; in the form of equation 
2.7, you get 

Y, - 100 = e t + 1.8<?,_! + 1.19e t _ 2 + 2.4k,_ 3 + . . . (2.10) 

The prediction error variances for one, two, three, and four steps ahead are then a 2 , a 2 ( l + 1,8 2 ), 
c 2 (l + 1.8 2 +2.29 2 ), and c 2 (l + 1.8 2 + 2.29 2 + 2.41 2 ). 
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Surprisingly, the weights on e seem to increase as you move further into the past. However, if you 
continue to write out the expression for Y ; in terms of e t , you see that the weights eventually taper off 
toward 0, just as in equation 2.7. You obtained equation 2.10 by writing the model 

(l - 1.80B + .95B 2 j(Y, - p) = e t 

as 

(Y, - n) = (l -1.80B + 95B 2 e t 

= (l + 1.80B + 2.29B 2 +2.41B 3 +. . . )<?, 

Now replace B with an algebraic variable M. The key to tapering off the weights involves the 
characteristic equation 

1-1.80M + .95M 2 = 0 

If all values of M (roots) that solve this equation are larger than 1 in magnitude, the weights taper off. 
In this case, the roots are M = .95+ 39/, which is a complex pair of numbers with magnitude 1.03. In 

equation 2.5, the roots are 1.11 and 1.25. The condition of roots having a magnitude greater than 1 is 
called stationarity and ensures that shocks e in the distant past have little influence on the current 
observation Y ; . 

A general review of the discussion so far indicates that an AR model of order p, written as 

(Y r -p) = a 1 (Y M -p) + a 2 (Y r _ 2 -p) + . . . 

/v t (2-ID 

+ a P [ Y t- p -V) + e t 
can be written in backshift form as 

(l-«jB-« 2 B 2 -. . .-a^B'jfY, -p) = e t 
and can be written as an infinite weighted sum of current and past shocks e t with 
(Y, - p) = (l- cfjB- <x,B 2 -. . .-a p B^j e t 
= (l + W 1 B + W 2 B 2 +W 3 B 3 +. . .)<?, 

where you can find the W s. The Ws taper off toward 0 if all Ms satisfying 
1-cqM-c^M 2 -. . .-a p M p =0 
are such that | M | > 1. 

You have also learned how to compute the system of Yule-Walker equations by multiplying equation 
2.11 on both sides by ("y, , - pj for /=(). /= 1, /=2, . . . and by computing expected values. You can 
use these Yule-Walker equations to estimate coefficients a ■ when you know or can estimate values 

of the covariances y (/) . You have also used covariance patterns to distinguish the second-order AR 
model from the first-order model. 
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2.3 Fitting an AR Model in PROC REG 

Chapter 3, “The General ARIMA Model,” shows that associating autocovariance patterns with 
models is crucial for determining an appropriate model for a data set. As you expand your set of 
models, remember that the primary way to distinguish among them is through their covariance 
functions. Thus, it is crucial to build a catalog of their covariance functions as you expand your 
repertoire of models. The covariance functions are like fingerprints, helping you identify the model 
form appropriate for your data. 

Output 2.4 shows a plot of the stocks of silver at the New York Commodity Exchange in 1000 troy 
ounces from December 1976 through May 1981 (Fairchild Publications 1981). If you deal only with 
AR processes, you can fit the models by ordinary regression techniques like PROC REG or PROC 
GLM. You also can simplify the choice of the model's order, as illustrated in Output 2.5, and thus 
simplify your analysis. 

Assuming a fourth-order model is adequate, you regress Y f on Y t l , Y t2 , Y ? , and Y t4 using these SAS 
statements: 

DATA SILVER; 

TITLE 'MONTH END STOCKS OF SILVER'; 

INPUT SILVER 
T=_N_; 

RETAIN DATE '01DEC76'D LSILVER1-LSILVER4; 

DATE=INTNX('MONTH',DATE, 1) ; 

FORMAT DATE MONYY.; 

OUTPUT; 

LSILVER4=LSILVER3; 

LSILVER3=LSILVER2; 

LSILVER2=LSILVER1; 

LSILVER1=SILVER; 

CARDS; 


846 

827 

799 

768 

719 

652 

580 

546 

500 

493 

530 

548 

565 

572 

632 

645 

674 

693 

706 

661 

648 

604 

647 

684 

700 

723 

741 

734 

708 

728 

737 

729 

678 

651 

627 

582 

521 

519 

496 

501 

555 

541 

485 

476 

515 

606 

694 

788 

761 

794 

836 

846 


RUN; 

PROC PRINT DATA= SILVER; 

RUN; 

PROC REG DATA= SILVER; 

MODEL SILVER=LSILVER1 LSILVER2 

LSILVER3 LSILVER4 / SSI; 

RUN; 

PROC REG DATA= SILVER; 

MODEL SILVER=LSILVER1 LSILVER2; 

RUN; 
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Output 2.4 

Plotting 
Monthly Stock 
Values 


MONTH END STOCKS OF SILVER 

SILVER 



JAN 77 JAN78 JAN79 JAN80 JAN81 JAN82 

DATE 


Output 2.5 Using PROC PRINT to List the Data and PROC REG to Fit an AR Process 


MONTH END STOCKS OF SILVER 


Obs 

SILVER 

T 

DATE 

LSILVER1 

LSILVER2 

LSILVER3 

LSILVER4 

1 

846 

1 

JAN77 





2 

827 

2 

FEB77 

846 




3 

799 

3 

MAR 7 7 

827 

846 



4 

768 

4 

APR77 

799 

827 

846 


5 

719 

5 

MAY77 

768 

799 

827 

846 

(More 

Output 

Lines) 






48 

788 

48 

DEC80 

694 

606 

515 

476 

49 

761 

49 

JAN81 

788 

694 

606 

515 

50 

794 

50 

FEB81 

761 

788 

694 

606 

51 

836 

51 

MAR 81 

794 

761 

788 

694 

52 

846 

52 

APR81 

836 

794 

761 

788 
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Output 2.5 Using PROC PRINT to List the Data and PROC REG to Fit an AR Process (continued) 




MONTH END STOCKS OF SILVER 





The REG Procedure 





Model: M0DEL1 




Dependent Variable: 

SILVER 





Analysis of Variance 





Sum of 

Mean 


Source 


DF 

Squares 

Square 

F Value Pr > F 

Model 


4 

417429 

104357 

95.30 <.0001 

Error 


43 

47085 

1095.00765 

© 

Corrected 

Total 

47 

464514 




Root 

MSE 

33.09090 

R-Square 

0.8986 


Dependent Mean 

636.89583 

Adj R-Sq 

0.8892 


Coeff 

Var 

5.19565 






Parameter Estimates 




Parameter 

Standard 



Variable 

DF 

Estimate 

Error 

t Value 

Pr > |t| Type I SS 

Intercept 

1 

102.84126 

37.85904 

2.72 

0.0095 19470543 

LSILVER1 

1 

1.38589 

0.15156 

9.14 

<.0001 387295 

LSILVER2 

1 

-0.44231 

0.26078 

-1.70 

0.0971 28472 

LSILVER3 

1 

0.00921 

0.26137 

0.04 

0.9720 1061.93530 

LSILVER4 

1 

-0.11236 

0.15185 

-0.74 

0.4633 599.56290 © 



MONTH END STOCKS OF 

SILVER 





The REG Procedure 





Model: M0DEL1 




Dependent Variable: 

SILVER 





Analysis of Variance 





Sum of 

Mean 


Source 


DF 

Squares 

Square 

F Value Pr > F 

Model 


2 

457454 

228727 

220.26 <.0001 

Error 


47 

48808 

1038.45850 


Corrected 

Total 

49 

506261 




Root 

MSE 

32.22512 

R-Square 

0.9036 


Dependent Mean 

642.76000 

Adj R-Sq 

0.8995 


Coeff 

Var 

5.01355 






Parameter Estimates 




Parameter © Standard 


Variable 

DF Estimate 

Error t Value © Pr > |t| 

Intercept 

1 77 

.95372 30. 

21038 2 

.58 0.0131 

LSILVER1 

1 1 

.49087 0. 

11589 12 

.86 <.0001 

LSILVER2 

1 -0 

.61144 0. 

11543 -5 

.30 <.00 
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Output 2.5 shows that lags 3 and 4 may not be needed because the overall F statistic for these two 
lags is computed O © as 

((1062+ 600)/2)/ 1095 = .76 

This is insignificant compared to the Fjl distribution. Alternatively, a TEST statement could be used 
to produce F. 

You have identified the model through overfitting, and now the final estimated model © is 
Y, = 77.9537 + 1.4909Y M - .6114Y,_ 2 + e t 

which becomes 

Y t - 647 = 1.4909(Y m - 647) - .6114(Y,_, - 647) + e t 

All parameters are significant according to their t statistics. © 

The fact that M=1 almost solves the characteristic equation 

1 - 1.49M + .61M 2 

suggests that this series may be nonstationary. 

In Chapter 3, you extend your class of models to include moving averages and mixed ARMA 
models. These models require more sophisticated fitting and identification techniques than the 
simple regression with overfitting used in the silver example. 
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3.1 Introduction 


3.1.1 Statistical Background 

The general class of autoregressive moving average (ARMA) models is developed in this chapter. As 
each new model is introduced, its autocovariance function y(/) is given. This helps you use the 
estimated autocovariances C(j) that PROC ARIMA produces to select an appropriate model for the 
data. Using estimated autocovariances to determine a model to be fit is called model identification. 
Once you select the model, you can use PROC ARIMA to fit the model, forecast future values, and 
provide forecast intervals. 
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3.1.2 Terminology and Notation 

The moving average of order 1 is given by 

Y t =v. + e t -$e t _ x (3.1) 

where e t is a white noise (uncorrelated) sequence with mean 0 and variance o 2 . Clearly, 
var(Y,)= y(0) = cr(l + p 2 ) 

cov (Y,, Y M ) = y(l) = E((<?, - p<? M )(e M " P V 2 )) = “ P° 2 

and 

cov (y,. Y r _ ; )= 0 
for j> 1. 

If you observe the autocovariance sequence y(o) = 100, y(l) = 40, y(2) = 0, y(3) = 0, . . . , you are 
dealing with an MA process of order 1 because y(/j = 0 for /> I. Also, you know that - pa 2 = 40 and 
(l + p 2 )c 2 = 100, so p = -.5 and cr = 80. The model is 

Y t = H- + e t + .5e M 

If each autocovariance y (/) is divided by y(o), the resulting sequence of autocorrelations is p (/). 
For a moving average like equation 3.1, 

p(o) = 1 

p(l) = -p/(l + p 2 ) 

and 

P(j) = 0 

for j> 1. Note that 

-l/2<-p/(l + p 2 )<l/2 

regardless of the value p. In the example, the autocorrelations for lags 0 through 4 are 1, .4, 0, 0, and 
0. 

The general moving average of order q is written as 

Y t =^ + e t -Pje M - ...-$ q e t _ q 

and is characterized by the fact that y( /j and p( /j are 0 for j>q. In backshift notation you write 
Y, =q + (l-P 1 B-p 2 B 2 - • • • - P,jB‘ 7 )e r 

Similarly, you write the mixed autoregressive moving average model ARMA(/?,<y) as 
(Y, - h) - (Y,_j - p) -. . .- a p (Y,_ p -p) 

= c-Pn,_ 1 -. . 
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or in backshift notation as 

(t - ctjB -... -a^)(Y,-ji) = (l-p i B-... -^)e, 

For example, the model 
(1-.6 B)y,=(1 + .4B>, 

is an ARMA(1,1) with mean li = 0. In practice, parameters are estimated and then used to estimate 
prediction error variances for several periods ahead. PROC ARIMA provides these computations. 


3.2 Prediction 


3.2.1 One-Step-Ahead Predictions 

You can further clarify the example above by predicting sequentially one step at a time. Let n denote 
the number of available observations. The next (n+ 1) observation in the sequence satisfies 

Y = 6Y +e + 4e 

1 n +1 ' u 1 n ^ e n +1 ^ 


First, predict Y +1 by 


Y = 6Y + 4e 

+ 7+1 


with error variance o 2 . Next, 

Y = 6Y +e + 4e 

1 77+2 U 1 77+1 ^ + 7+2 ^ ^ 77+1 


~ e n+ 2 + '^ e n+ 1 


- -6(.6Y„ +e n+1 +.4e„y 
so predict Y n+2 by removing “future cs” (subscripts greater than n): 


.36Y + ,24c„ = ,6Y 


12+1 


The prediction error is e .+ e ,, which has variance 2o 2 . Finally, 


Y„ + 3 = -6Y„ +2 


^ 77+3 


■4e, 


n+2 


and 


Y„+3 = - 6Y n+2 +0 


so the prediction error is 
= . 6 ( 
= . 6 ( 


Y - Y 6(y - Y 

A 12+3 1 12+3 • U | 1 12+2 1 12+2^ 


*n+3 


f Ae, 
Ae„ 


n+2 


and the prediction error variance is 2.36a 2 . This example shows that you can readily compute 
predictions and associated error variances after model parameters or their estimates are available. 
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The predictions for the model 

Y t = .6Y,_j +e t + Ae t _ x 

can be computed recursively as follows: 


Observation 

10 

5 

-3 

-8 

1 

6 

— 

— 

— 

Prediction 

(0) 

10 

1 

-3.4 

-6.64 

3.656 

4.538 

2.723 

1.634 

Residual 

10 

-5 

-4 

-4.6 

7.64 

2.344 

— 

— 

— 


Start by assuming the mean (0) as a prediction of Yj with implied error e l = 10. Predict Y 2 by 
0.6Yj + 0.4Cj = 10, using the assumed e l = 10. The residual is r 2 = 5 -10 = -5. Using r 2 as an 
estimate of e 2 , predict Y 3 by 

0.6Y 2 + 0.4r 2 = 0.6(5) + 0.4(-5) = 1 

The residual is r 3 = Y 3 - 1 = -4. Then predict Y 4 by 
0.6Y 3 + 0.4r 3 = 0.6(-3) + 0.4(-4) = -3.4 


and Y 5 by -6.64 and Y 6 by 3.656. These are one-step-ahead predictions for the historic data. For 
example, you use only the data up through t = 3 (and the assumed e l = 10 ) to predict Y 4 . The sum 

of squares of these residuals, 100 + 25-1-1- 2.344 2 = 226.024, is called the conditional sum of 

squares associated with the parameters 0.6 and 0.4. If you search over AR and MA parameters to find 
those that minimize this conditional sum of squares, you are performing conditional least squares 
estimation, the default in PROC ARIMA. An estimate of the white noise variance is given by 
dividing the conditional sum of squares by n minus the number of estimated parameters; that is, 

«-2 = 6- 2 = 4 for this ARMA(1,1) with mean 0. 


3.2.2 Future Predictions 

Predictions into the future are of real interest, while one-step-ahead computations are used to start the 
process. Continuing the process as shown, estimate e t s as 0 for t beyond n (n = 6 observations in the 
example); that is, estimate future Ys by their predictions. 

The next three predictions are as follows: 

Y 7 is 0.6(6) + 0.4(2.344) = 4.538 with error e 7 
Y 8 is 0.6(4.538) + 0.4(0) = 2.723 with error e s +e ^ 

Y 9 is 0.6(2.723) + 0.4(0) = 1.634 with error e 9 + e s + 0.6e 7 . 

PROC ARIMA provides these computations for you. The illustration simply shows what PROC 
ARIMA is computing. 

Note that the prediction of Y 7+/ is just (,6) 'Y 7 ' and thus declines exponentially to the series mean (0 

in the example). The prediction error variance increases from var(e f ) to var(Y f ). In a practical 
application, the form 
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Y t - otY M =e t - Pe M 

and parameter values 

Y , - -6Y m =e t + M t _ x 

are not known. They can be determined through PROC ARIMA. 

In practice, estimated parameters are used to compute predictions and standard errors. This procedure 
requires sample sizes much larger than those in the example above. 

Although they would not have to be, the forecasting methods used in PROC ARIMA are tied to the 
method of estimation. If you use conditional least squares, the forecast is based on the expression of 
Y t - p as an infinite autoregression. For example, suppose Y t = p + e t - fie t ,, a simple MA(1). Note 
that e t = Y t - p + fie t ,, so at time t -l you have e t l = Y t l - p + fie t 2 : substituting this second 
expression into the first, you have e t = (Y ( - p) + P(Y ( , - p) + pY ( 2 . Continuing in this fashion, and 

co 

assuming that |P| < 1 so that P '. converges to 0 as j gets large, you find e t = ZP'( Y , ,-p). which 

,/=a 

oo 

can alternatively be expressed as (Y, - p) = -^P' (Y ( j - p) + e r Thus the forecast of Y t given data 

./=! 

up to time t - 1 is Y t = p - ^P' (Y ( . - p). The expression ^P' (Y t _. - p) depends on Y values 

./=! ' ./=! 

prior to time 1, the “infinite past.” PROC ARIMA assumes Y values before time 1 are just equal to p 

t-i 

and, of course, the parameters are replaced by estimates. The truncated sum ^ p' (Y ( ; - p) is not 

./=! 

necessarily the best linear combination of lagged Y values for predicting Y t - p. 

When ML or ULS estimation is used, optimal linear forecasts based on the finite past are computed. 
Suppose you want to minimize E |[(Y, - p) - (^ (Y, , - p) - c|) 2 (Y ( 2 - p)] | by finding (|^ and (|) 2 ; 

that is, you want the minimum variance forecast of Y t based on its two predecessors. Note that here 
(|^ and (|) 2 are just coefficients; they do not represent autoregressive or moving average parameters. 
Using calculus you find 

r ^ 'i=r y (°) 

UJ“ly(i) r(0)J ly(2)J 

This would give the best forecast of Y 3 based on a linear combination of Y, and Y, : 

Y 2 = p + ( h (Y 2 -p) + (|) 2 (Y, -p) 

Likewise, to forecast Y 5 using Yj, Y 2 , Y 3 , Y 4? the four <|) ( s are computed as 

r T (0) y(l) y(2) y(3)yVy(l)^ 

^ _ y(i) y(°) y(i) y( 2 ) y( 2 ) 

^ y(2) y(i) y(°) y(i) y( 3 ) 

\y( 3 ) y( 2) yO) y(°)J U( 4 )y 
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Here y(h) is the autocovariance at lag h and the equations for the (jr s can be set up for any ARMA 

structure and any number of lags. For the MA(1) example with parameter (3, to predict the fifth Y 
you have 


V 


'1+P 2 

-p 

0 

0 ^ 

-1 

f“ P l 



-P 

1+p 2 

-p 

0 


0 

4*3 


0 

-p 

1+p 2 

-p 


0 



l 0 

0 

-p 

1 + P 2 ; 


l 0 J 


For reasonably long time series whose parameters are well inside the stationary and invertibility 
regions, the best linear combination forecast used when ML or ULS is specified does not differ by 
much from the truncated sum used when CLS is specified. (See Section 3.3.1.) 

For an MA(1) process with lag 1 parameter p = 0.8, the weights on past Y, used in forecasting 1 step 
ahead, are listed below. The top row shows the first 14 weights assuming infinite past (-(0.8) '), and 
the next two rows show finite past weights for n = 9 and n = 14 past observations. 


lag 

Y ,_! 

Y ,_* [ 

Y,_3 

Y,_ 4 

y ,_ 5 

y ,_ 6 

y ,_ 7 


Y 

L t -13 

Y 

L t-U 

Infinite past 

-.80 

-.64 

-.51 

-.41 

-.33 

-.26 

-.21 


-.05 

-.04 

n = 9 , finite past 

-.79 

-.61 

-.47 

-.35 

-.25 

-.16 

-.08 


— 

— 

n = 14, finite past 

-.80 

-.64 

-.51 

-.41 

-.32 

-.26 

-.21 


-.02 

-.02 


Despite the fairly large p and small n values, the weights are quite similar. Increasing n to 25 
produces weights indistinguishable, out to 2 decimal places, from those for the infinite past. 

If p = 1, the series Y t = e t - ie t _ x is said to be “noninvertible,” indicating that you cannot get a nice, 
convergent series representation of e t as a function of current and lagged Y values. Not only does 
this negate the discussion above, but since a reasonable estimate of e t cannot be extracted from the 
data, it eliminates any sensible model-based forecasting. In the moving average of order q , 

Y t =e t - PjC^j -p e t _ has an associated polynomial equation in the algebraic variable M, 

1 - PjM-P^M 1 ' = 0, whose roots must satisfy |m| > 1 in order for the series to be invertible. 

Note the analogy with the characteristic equation computed from the autoregressive coefficients. 

Fortunately in practice it is rare to encounter a naturally measured series that appears to be 
noninvertible. However, when differences are taken, noninvertibility can be artificially induced. For 
example, the time series Y t = a () + a x t + e t is a simple linear trend plus white noise. Some 
practitioners have the false impression that any sort of trend in a time series should be removed by 
taking differences. If that is done, one sees that 

Y , - Y ,-i = K + «it + e,)~ (a 0 + cq (t - 1) + e,_ x ) = cq + e, - e,_ x 

so that in the process of reducing the trend a n + a x t to a constant oq , a noninvertible moving 
average has been produced. Note that the parameters of Y t = a () + a x t + e t are best estimated by the 
ordinary least squares regression of Y t on t, this being a fundamental result of basic statistical 
theory. The practitioner perhaps was confused by thinking in a very narrow time series way. 
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3.3 Model Identification 


3.3.1 Stationarity and Invertibility 

Consider the ARMA model 

(l-ctjB-c^B 2 . .-a,B')(Y,-*i) 

= (l-P 1 B-P 2 B 2 -. . . - P q B*)e t 
The model is stationary if all values of M such that 
1 - oqM - a 2 M 2 - ... - = 0 

are larger than 1 in absolute value. Stationarity ensures that early values of e have little influence on 
the current value of Y. It also ensures that setting a few values of e to 0 at the beginning of a series 
does not affect the predictions very much, provided the series is moderately long. In the ARMA(l.l) 
example, the prediction of Y 6 with 0 as an estimate of <? differs from the prediction using the true e x 
by the quantity .01 e t . Any MA process is stationary. One AR example is 

(l-1.3B + .3B 2 )Y r =c r 

which is not stationary (the roots of 1-1.3M+.3M 2 =0 are M=1 and M=10/3). Another example is 
(l-1.3B+.42B 2 )Y r =e t 

which is stationary (the roots of 1-1.3M+.42M 2 =0 are M=10/7 and M=10/6). 

A series satisfies the invertibility condition if all Ms for which 
1 - PjM - p 2 M 2 - ... - P (J M‘ J = 0 

are such that |M| > 1. The invertibility condition ensures that Y ; can be expressed in terms of e i and 
an infinite weighted sum of previous Ys. In the example, 

^(l + ABHl-AB^ 

and 

e t =Y t - Y,_, + AY t _ 2 - 16Y r _ 3 + ,064Y,_ 4 
so 

Y t =e t + Y,_, - ,4Y, 2 + .16Y, , - .064Y, 4 + ... 

The decreasing weights on lagged values of Y allow you to estimate e t from recent values of Y. Note 
that in Section 3.2.1 the forecast of Y , was .6Y +.4e , so the ability to estimate e from the data was 

n +1 n rv J n 

crucial. 
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3.3.2 Time Series Identification 

You need to identify the form of the model. You can do this in PROC ARIMA by inspecting data- 
derived estimates of three functions: 

□ autocorrelation function (ACF) 

□ inverse autocorrelation function (IACF) 

□ partial autocorrelation function (PACF). 

These functions are defined below. A short catalog of examples is developed, and properties useful 
for associating different forms of these functions with the corresponding time series forms are 
summarized. 

In PROC ARIMA, an IDENTIFY statement produces estimates of all these functions. For example, 
the following SAS statements produce lists and plots of all three of these functions for the variable Y 
in the data set SERIES: 

PROC ARIMA DATA=SERIES; 

IDENTIFY VAR=Y; 

RUN; 

3.3.2.1 Autocovariance Function y (j) 

Recall that y (j) is the covariance between Y ; and Y , which is assumed to be the same for every t 
(stationarity). See the listing below of autocovariance functions for Series 1-8 (in these examples, e t 
is white noise with variance o 2 = 1). 


Series 

1 

2 

3 

4 

5 

6 

7 

8 


Model 

Y,=.8Y,_ 1+ e„ AR(l), y(l) > 0 
Y,=-.8Y,_ 1+ e„ AR(l). y(l) < 0 
Y,=.3Y,_ 1+ .4Y,_ 2 +e„ AR(2) 
Y,=.7Y,_ 1+ .49Y,_ 2 +e„ AR(2) 

V- • ,8e. ,, MA(l) 

Y ; = e t -- 3 Vl +Ae t-2~ MA ( 2 ) 

Y, = e r . (white noise) 

Y,=.6Y m +e t+ Ae t _,, ARMA(l.l) 


For an AR(1) series Y, - pY, , = e t (such as Series 1 and 2), the covariance sequence is 

y{j) = p'V/(i-p 2 ) 

For an AR(2) series Y, -oqY, , -a 2 Y, 2 = e t (such as Series 3 and 4), the covariance sequence 
begins with values y(o) and ;/(l). followed by y(/j that satisfy 
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yO) - «iyO' - 0 - a 2 r 0 ' - 2 ) = o 

The covariances may oscillate with a period depending on oq and a 2 (such as Series 4). Beginning 
values are determined from the Yule-Walker equations. 

For a general AR(p) series 

Y t ~ a l Y f-l ~ (X 2 Y t-2 -■■■- a p Y t-p = e t 

beginning values are y(o), . . ., y(y>-l), from which y(/j satisfies 

yO) - a iY0' - l) - «2 y0' - 2) - ... - a. p y(j - p) = 0 

for j> p. The fact that y (j) satisfies the same difference equation as the series ensures that 

|r(y)| < HX j , where 0 < X < 1 and H is some finite constant. In other words, y(j) may oscillate, but 

it is bounded by a function that decreases exponentially to zero. 

ForMA(l), 

Y t -p = e,-Pe M 

y(0) = (l + |3 2 )o 2 

y(!) = y(“1) = “P° 2 

and y(j) = 0 for |yj>l. 

For a general MA(^) 

Y t-V = e t- Pi e t-\ ~ $2 e t-2 -■■■- $ q e t - q 

the q beginning values are y(o), y(l), . . ., y (q). Then y(/j =0 for \j\>q. 

For an ARMA(1,1) process 

( Y, - p) - a.(Y ( , - p) = q -pq , 

there is a dropoff from y(0) to y(l) determined by a and |3. For /> 1. the pattern 
y(y) = ay(j-l) 

occurs. Thus, an apparently arbitrary drop followed by exponential decay characterizes the 
ARMA(1,1) covariance function. 

For the ARMA(p,<y) process 

(Y, - h) - otj (Y m - p) -. . . -a p (y,_ p -\i)=e t - ^e t _ x -... - $ q e t _ q 
there are 

r = max (p - 1, q) 

beginning values followed by behavior characteristic of an AR(p); that is, 

Y(y)-a 1 y(y-l)- ...-a p y(j-p) = 0 

for j>r. For a white noise sequence, y( /) = 0 if y A 0. 
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3.3.2.2 ACF 

Note that the pattern, rather than the magnitude of the sequence ;/( /). is associated with the model 
form. Normalize the autocovariance sequence y (j) by computing autocorrelations 

p(y') = r(j) / r(°) 

Note that 

p(o) = 1 

for all series and that 

p(y) = p(-y') 

The ACFs for the eight series previously listed are listed below. 


Series 

Model, ACF 

1 

Y, = -8Y, , + P (y) = .s'-'' 

2 

ii 

\ 

oo 

1 

+ 

II 

'T' 

bo 

T2T 

3 

Y, = -3Y r _j + AY t _ 2 +e t , p(l) = .5000, 

p(j) = - 3 p(J -1) + - 4 p( 7 - 2 ) for J > 1 

4 

Y t = .lY t _ l -A9Y t _ 2+ e t , p(l) = .4698, 
p(j) = - 7 P(j -1) - -49p(y - 2) for j > 1 

5 

Y, = c + .8*,_ 15 p(l) = .4878, p(y) = 0 for j > 1 

6 

Y, = e t -3e t _, - Ae t _ 2 , p(l) = -.144, 
p(2) = -.32, p(y) = 0 Pox j >2 

7 

Y, = e t , p(0) = 1, p(y) = 0 for j > 0 

8 

Y, -.6Y m = e t +.4e t _„ p(0) = 1, p(l) = .7561, 
p(j) = MS- 1 ) f or 7>1 


The ACFs are plotted in Output 3.1. 
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3.3.2.3 PACF 

The PACF is motivated by the regression approach to the silver example in Chapter 2, “Simple 
Models: Autoregression.” First, regress Y ; on Y and call the coefficient on Y ( ; h l . Next, regress Y ; 
on Y , Y^, and call the coefficient on Y k 1 . Continue in this manner, regressing Y r on Y , 

Y r . . , Y, and calling the last coefficient 7t ; . The ft , values are the estimated partial 
autocorrelations. 

In an autoregression of order p, the coefficients ft , estimate Os for all j> p. The theoretical partial 
autocorrelations 7 1 ■ estimated by the ft , are obtained by solving equations similar to the regression 
normal equations. 


r{o) 

H 1 ) 

■ ■ r{j- 1) 

V 


H 1 ) 

H 1 ) 

r(o) 

• • r{j- 2 ) 

b 2 


H 2 ) 

r{j~ i) 

r{j~ 2 ) ■ 

r(o) _ 

A_ 


r{j)_ 


For each j, let n i = h r (A new set of equations is needed for each j.) As with autocorrelations, the 
71 ■ sequence is useful for identifying the form of a time series model. The PACF is most useful for 

identifying AR processes because, for an A Rip), the PACF is 0 beyond lag p. For MA or mixed 
(ARMA) processes, the theoretical PACF does not become 0 after a fixed number of lags. 

You can solve the previous set of equations for the catalog of series. When you observe an estimated 
PACF 7t, compare its behavior to the behavior shown next to choose a model. The following is a 

list of actual partial autocorrelations for Series 1-8: 



64 SAS for Forecasting Time Series 


Lag 


Series 

Model 

1 

2 

3 

4 

5 

1 

Y,= -8Y M + e, 

0.8 

0 

0 

0 

0 

2 

Y, = _ -8 Y ; + e t 

-0.8 

0 

0 

0 

0 

3 

Y = ,3Y + ,4Y + e 

t t-1 t—2 t 

0.5 

0.4 

0 

0 

0 

4 

Y, - -7Y m - .49 Y m + e t 

0.4698 

-0.4900 

0 

0 

0 

5 

Y, = ^, + -8 e t x 

0.4878 

-0.3123 

0.2215 

-0.1652 

0.1267 

6 

** 

1 ^ 

cn 

1 

II 

-0.144 

-0.3480 

-0.1304 

-0.1634 

-0.0944 

7 

Y, = e, 

0 

0 

0 

0 

0 

8 

Y, = -6Y + e t + Ae 

0.7561 

-0.2756 

0.1087 

-0.0434 

0.0173 


Plots of these values against lag number, with A used as a plot symbol for the ACF and P for the 
PACF, are given in Output 3.2. A list of actual autocorrelations for Series 1-8 follows: 


Lag 


Series 

Model 

1 

2 

3 

4 

5 

1 

Y, = -8Y m + e t 

0.8 

0.64 

0.512 

0.410 

0.328 

2 

Y, - _ -8 Y ( + e t 

-0.8 

0.64 

-0.512 

0.410 

-0.328 

3 

Y = ,3Y + ,4Y + e 

t t- 1 t-2 t 

0.500 

0.550 

0.365 

0.330 

0.245 

4 

Y, - ■ 7Y ; _,-.49Y ; 2 + e t 

0.470 

-0.161 

-0.343 

-0.161 

0.055 

5 

Y, = e, + .8e H 

0.488 

0 

0 

0 

0 

6 

^, = e,-.3e i _ 1 -Ae i _ 2 

-0.144 

-0.32 

0 

0 

0 

7 

Y, = e, 

0 

0 

0 

0 

0 

8 

Y = ,6Y + e + Ae 

0.756 

0.454 

0.272 

0.163 

0.098 
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Output 3.2 shows the plots. 


Output 3.2 

Plotting Actual 
A u toco rrel a ti ons 
and Actual 
Partial 

Auto co rrela dons 
for Series 1-8 
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Output 3.2 

Plotting Actual 
A u toco rrel a ti ons 
and Actual 
Partial 

A u toco rrela tions 
for Series 1-8 
(continued) 



type ACF p p p PACF 
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Output 3.2 

Plotting Actual 
Autocorrelations 
and Actual 
Partial 

A u toco rrela tions 
for Series 1-8 
(continued) 
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Output 3.2 

Plotting Actual 
Auto co rrela tions 
and Actual 
Partied 

A u toco rrela tions 
for Series 1-8 
(continued) 
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3.3.2.4 Estimated ACF 

Begin the PROC ARIMA analysis by estimating the three functions defined above. Use these 
estimates to identify the form of the model. Define the estimated autocovariance C (j) as 

c(/) = s(y, -yXy, +; . -y)//7 

where the summation is from 1 to n-j and Y is the mean of the entire series. Define the estimated 
autocorrelation by 

r(j ) = C(/) / C(o) 

Compute standard errors for autocorrelations in PROC ARIMA as follows: 

□ For autocorrelation r(j), assign a variance (lr : (/))/« where the summation runs 
from -j +1 to j- 1. 

□ The standard error is the square root of this variance. 

□ This is the appropriate variance under the hypothesis that y(z) = 0 for i > j while 
y(zj 0 for i < j. 

The group of plots in Output 3.3 illustrates the actual (A) and estimated (E) ACFs for the series. 
Each data series contains 150 observations. The puipose of the plots is to indicate the amount of 
sampling error in the estimates. 
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Output 3.3 

Plotting Actual 
and Estimated 
A u toco r relci ti ons 
for Series 1-8 
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Output 3.3 

Plotting Actual 
and Estimated 
Autocorrelations 
for Series 1-8 
(continued) 
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Output 3.3 

Plotting Actual and 
Estimated 

A u toco r relci ti ons for 
Series 1-8 
(continued) 



type ACTUAL 


e e e ESTIMATED 
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Output 3.3 

Plotting Actual and 
Estimated 

A u toco rrel a ti ons for 
Series 1-8 
(continued) 



3.3.2.5 Estimated PACF 

The partial autocorrelations are defined in Section 3.3.2.3 as solutions to equations involving the 
covariances y (/). To estimate these partial autocorrelations, substitute estimated covariances C(/') 
for the actual covariances and solve. For j large enough that the actual partial autocorrelation 71 ■ is 0 
or nearly 0, an approximate standard error for the estimated partial autocorrelation is « 1/2 . 
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The next group of plots, in Output 3.4, illustrate the actual (A) and estimated (E) PACFs for the 
series. 


Output 3.4 

Plotting Actual 
and Estimated 
Partied 

Autocorrelations 
for Series 1-8 
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Output 3.4 

Plotting Actual 
and Estimated 
Partied 

A u toco rrel a ti ons 
for Series 1-8 
(continued) 
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Output 3.4 

Plotting Actual 
and Estimated 
Partied 

A u toco rrelci ti ons 
for Series 1-8 
(continued) 



type ACTUAL 


e e e ESTIMATED 
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Output 3.4 

Plotting Actual 
and Estimated 
Partied 

A u toco rrel a ti ons 
for Series 1-8 
(continued) 
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3.3.2.6 IACF 

The IACF of an ARMA (p,q) model is defined as the ACF of the ARM A(V/,p) model you obtain if 
you switch sides with the MA and AR operators. Thus, the inverse autocorrelation of 

(l--8BXY,-p) = e, 

is defined as the ACF of 

Y t -[i = e t - .Ze t _, 

In the catalog of Series 1-8, for example, the IACF of Series 3 is the same as the ACF of Series 6 
and vice versa. 

3.3.2.7 Estimated IACF 

Suppose you know that a series comes from an AR(3) process. Fit an AR(3) model to obtain 
estimated coefficients—for example, 

Y, - P = -300(Y r l - p) + ,340(Y i> _ 2 - p) - ,120(Y,_ 3 - p) + e, 

The inverse model is the moving average 

% - F = e t ~ -300e M - 340e t 2 + .I20e t _ 3 

The inverse autocovariances are estimated by 

(l + .300 2 + .340 2 + .120 2 )cr 

at lag 0, 

(- .300 + (,300)(.340) - (,340)(.120))ct 2 
at lag 1, 

(- .340 - (,300)(. 120))ct 2 


at lag 2, and ,120 ct 2 at lag 3. 
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In general, you do not know the order p of the process, nor do you know the form (it may be MA or 
ARMA). Use the fact (see Section 3.3.1) that any invertible ARMA series can be represented as an 
infinite-order AR and therefore can be approximated by an AR(p) with p large. 

Set p to the minimum of the NLAG value and one-half the number of observations after differencing. 
Then do the following: 

□ Fit AR(/;) to the data. 

□ Using the estimated coefficients, compute covariances for corresponding MA series 
as illustrated above for p= 3. 

— 1/2 

□ Assign standard errors of n to the resulting estimates. 


3.3.3 Chi-Square Check of Residuals 

In the identification stage, PROC ARIMA uses the autocorrelations to form a statistic whose 
approximate distribution is chi-square under the null hypothesis that the series is white noise. The 
test is the Ljung modification of the Box-Pierce Q statistic. Both Q statistics are described in Box, 
Jenkins, and Riensel (1994) and the Ljung modification in Ljung and Box (1978, p. 297). The 
formula for this statistic is 

;?(;? + 2) Zy = i r 2 (j)/(n-j) 

where r(j) is the estimated autocorrelation at lag j and k can be any positive integer. In PROC 
ARIMA several ks are used. 

Later in the modeling stage, PROC ARIMA calculates the same statistic on the model residuals to 
test the hypothesis that they are white noise. The statistic is compared to critical values from a chi- 
square distribution. If your model is correct, the residuals should be white noise and the chi-square 
statistic should be small (the PROB value should be large). A significant chi-square statistic indicates 
that your model does not fit well. 


3.3.4 Summary of Model Identification 

At the identification stage, you compute the ACF, PACF, and IACF. Behavior of the estimated 
functions is the key to model identification. The behavior of functions for different processes is 
summarized in the following table: 
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Table 3.1 Summary of Model Identification 



MA(q) 

AR (p) 

ARMA {p, q) 

White noise 

ACF 

D(<?) 

T 

T 

0 

PACF 

T 

D(p) 

T 

0 

IACF 

T 

D(p) 

T 

0 


where 

D(7/) means the function drops off to 0 after lag q 
T means the function tails off exponentially 
0 means the function is 0 at all nonzero lags. 


3.4 Examples and Instructions 

The following pages contain results for 150 observations generated from each of the eight sample 
series discussed earlier. Thus, the ACFs correspond to the Es in Output 3.3. Even with 150 
observations, considerable variation occurs. 

To obtain all of the output shown for the first series Yl, use these SAS statements: 

PROC ARIMA DATA=SERIES; 

IDENTIFY VAR=Y1 NLAG=10; 

RUN; 


The VAR= option is required. The NLAG= option gives the number of autocorrelations to be 
computed and defaults to 24. When you fit an ARIMA(p,r/,^r), NLAG+1 must be greater than p+d+q 
to obtain initial parameter estimates. For the ARMA(/?,g) models discussed so far, d is 0. 

The following options can also be used: 

NOPRINT 

suppresses printout. This is useful because you must use an IDENTIFY statement prior 
to an ESTIMATE statement. If you have seen the output on a previous run, you may 
want to suppress it with this option. 

CENTER 

subtracts the series mean from each observation prior to the analysis. 

DATA =SASdataset 

specifies the SAS data set to be analyzed (the default is the most recently created SAS 
data set). 
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3.4.1 IDENTIFY Statement for Series 1-8 

The following SAS statements, when used on the generated data, produce Output 3.5: 

PROC ARIMA DATA=SERIES; 

IDENTIFY VAR=Y1 NLAG=10; 

IDENTIFY VAR=Y2 NLAG=10; 
more SAS statements 
IDENTIFY VAR=Y8 NLAG=10; 

RUN; 

Try to identify all eight of these series. These are presented in Section 3.3.2.1, so you can check your 
diagnosis against the actual model. For example, look at Y6. First, observe that the calculated Q 
statistic O is 17.03, which would be compared to a chi-square distribution with six degrees of 
freedom. The 5% critical value is 12.59, so you have significant evidence against the null hypothesis 
that the considered model is adequate. Because no model is specified, this Q statistic simply tests the 
hypothesis that the original data are white noise. The number 0.0092 © is the area under the chi- 
square distribution to the right of the calculated 17.03. Because 0.0092 is less than .05, without 
recourse to a chi-square table, you see that 17.03 is to the right of the 5% critical value. Either way, 
you decide that Y6 is not a white noise series. Contrast this with Y7, where the calculated statistic 
2.85 © has an area 0.8269 © to its right; 2.85 is far to the left of the critical value and nowhere near 
significance. Therefore, you decide that Y7 is a white noise series. 

A model is needed for Y6. The PACF and IACF are nonzero through several lags, which means that 
an AR diagnosis requires perhaps seven lags. A model with few parameters is preferable. The ACF is 
near 0 after two lags, indicating that you may choose an MA(2). Because an MA model has a 
persistently nonzero PACF and IACF, the MA(2) diagnosis seems appropriate. At this stage, you 
have identified the form of the model and can assign the remainder of the analysis to PROC ARIMA. 
You must identify the model because PROC ARIMA does not do it automatically. 

The generated series has 150 observations; note the width of the standard error bands on the 
autocorrelations. Even with 150 observations, reading fine detail from the ACF is unlikely. Your goal 
is to use these functions to limit your search to a few plausible models rather than to pinpoint one 
model at the identification stage. 
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Output 3.5 Using the IDENTIFY Statement for Series 1-8: PROC ARIMA 






The ARIMA Procedure 









Name of Variable = Y1 








Mean 

of 

Working Series -0 

83571 







Standarc 

Deviation 1 .( 

510893 







Number 

)f Observations 

150 









Autocorrelations 






Lag 

Covariance 

Correlation 


1987654321012345678 

9 

1 


Std Error 

0 

2.594976 

1.00000 



******************** 


0 

1 

1.993518 

0.76822 



*************** 




0.081650 

2 

1.493601 

0.57557 



************ 




0.120563 

3 

1.063870 

0.40997 



******** 




0.137669 

4 

0.819993 

0.31599 



****** 




0.145581 

5 

0.652487 

0.25144 



***** 




0.150084 

6 

0.644574 

0.24839 



***** 




0.152866 

7 

0.637198 

0.24555 



***** 




0.155534 

8 

0.609458 

0.23486 



***** 




0.158097 

9 

0.504567 

0.19444 



* * * * 




0.160406 

10 

0.372414 

0.14351 



* * * 




0.161970 




Inverse Autocorrelations 






Lag 

Correlation 


1987654321012345678 

9 

1 




1 

-0.48474 


********** 







2 

-0.03700 


* 







3 

0.07242 



* 






4 

-0.06915 


* 







5 

0.08596 



* * 






6 

-0.04379 


* 







7 

0.00593 









8 

-0.02651 


* 







9 

-0.00946 









10 

0.02004 











Partial Autocorrelations 






Lag 

Correlation 


1987654321012345678 

9 

1 




1 

0.76822 



*************** 






2 

-0.03560 


* 







3 

-0.05030 


* 







4 

0.06531 



* 






5 

0.01599 









6 

0.11264 



* * 






7 

0.02880 



* 






8 

0.00625 









9 

-0.03668 


* 







10 

-0.03308 


* 








Autocorrelation Check for White Noise 





To 

Chi- 

Pr > 








Lag 

Square 

DF ChiSq 


.Autocorrelations-- 

- 

- 

— 


6 

202.72 

6 <.0001 


0.768 0.576 0.410 0.316 


0 

251 

0.248 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 






The ARIMA Procedure 









Name of Variable = Y2 








Mean 

of 

Working Series -0 

07304 







Standarc 

Deviation 1 ." 

''40946 







Number 

)f 

Observations 

150 










Autocorrelations 






Lag 

Covariance 

Correlation 


1 

98765432101234567 

8 

9 1 


Std Error 

0 

3.030893 

1.00000 




******************** 


0 

1 

-2.414067 

-.79649 



**************** 





0.081650 

2 

1 .981819 

0.65387 




************* 




0.122985 

3 

-1.735348 

-.57255 



*********** 





0.144312 

4 

1.454755 

0.47998 




********** 




0.158735 

5 

-1 .242813 

-.41005 



******** 





0.168132 

6 

1.023028 

0.33753 




******* 




0.174672 

7 

-0.844730 

-.27871 



****** 





0.178968 

8 

0.790137 

0.26069 




***** 




0.181838 

9 

-0.623423 

- .20569 



* * * * 





0.184313 

10 

0.494691 

0.16322 




* * * 




0.185837 




Inverse Autocorrelations 






Lag 

Correlation 


1 

98765432101234567 

8 

9 1 




1 

0.50058 




********** 






2 

0.08442 




* * 






3 

0.10032 




* * 






4 

0.03955 




* 






5 

0.02433 










6 

-0.01680 










7 

-0.10020 



* * 







8 

-0.12236 



* * 







9 

-0.05245 



* 







10 

-0.00211 












Partial Autocorrelations 






Lag 

Correlation 


1 

98765432101234567 

8 

9 1 




1 

-0.79649 



**************** 







2 

0.05329 




* 






3 

-0.10166 



* * 







4 

-0.03975 



* 







5 

-0.02340 










6 

-0.04177 



* 







7 

-0.00308 










8 

0.07566 




* * 






9 

0.08029 




* * 






10 

0.00344 











Autocorrelation Check for White Noise 





To 

Chi- 

Pr > 









Lag 

Square 

DF ChiSq 


- 

.Autocorrelations 

- 


" “ “ “ 


6 

294.24 

6 <.0001 


0 

.796 0.654 -0.573 0.480 


-0 

410 

0.338 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 





The ARIMA Procedure 
Name of Variable = Y3 









Mean 

of 

Working Series -0 

55064 








Standard Deviation 1.237272 








Number of Observations 

150 










Autocorrelations 







Lag 

Covariance 

Correlation 


198765432101234567 

8 

9 

1 


Std Error 

0 

1.530842 

1.00000 



******************** 


0 

1 

0.693513 

0.45303 



********* 





0.081650 

2 

0.756838 

0.49439 



********** 





0.096970 

3 

0.395653 

0.25845 



***** 





0.112526 

4 

0.417928 

0.27301 



***** 





0.116416 

5 

0.243252 

0.15890 



* * * 





0.120609 

6 

0.311005 

0.20316 



* * * * 





0.121997 

7 

0.274850 

0.17954 



* * * * 





0.124232 

8 

0.295125 

0.19279 



* * * * 





0.125950 

9 

0.212710 

0.13895 



* * * 





0.127902 

10 

0.154864 

0.10116 



* * 





0.128904 




Inverse Autocorrelations 







Lag 

Correlation 


198765432101234567 

8 

9 

1 




1 

-0.17754 


* * * * 








2 

-0.30226 


****** 








3 

0.04442 



* 







4 

-0.02891 


* 








5 

0.07177 



* 







6 

0.00400 










7 

-0.04363 


* 








8 

-0.06394 


* 








9 

-0.00347 










10 

0.03944 



* 









Partial Autocorrelations 







Lag 

Correlation 


198765432101234567 

8 

9 

1 




1 

0.45303 



********* 







2 

0.36383 



******* 







3 

-0.07085 


* 








4 

0.04953 



* 







5 

-0.00276 










6 

0.07517 



* * 







7 

0.07687 



* * 







8 

0.03254 



* 







9 

-0.02622 


* 








10 

-0.04933 


* 









Autocorrelation Check for White Noise 






To 

Chi- 

Pr > 









Lag 

Square 

DF ChiSq 


.Autocorrelations 

- 

- 

- 

" " " " 


6 

101.56 

6 <.0001 


0.453 0.494 0.258 0.273 



0 

159 

0.203 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 







The ARIMA Procedure 










Name of Variable = Y4 









Mean 

of 

Working Series -0 

21583 








Standarc 

Deviation 1 .1 

381192 








Number 

)f Observations 

150 










Autocorrelations 






Lag 

Covariance 

Correlation 


198765432101234567 

8 

9 1 


Std Error 

0 

1.907692 


1 .00000 



******************** 


0 

1 

0.935589 


0.49043 



********** 




0.081650 

2 

-0.297975 


- .15620 


* * * 





0.099366 

3 

-0.810601 


-.42491 


******** 





0.100990 

4 

-0.546360 


-.28640 


****** 





0.112278 

5 

-0.106682 


- .05592 


* 





0.117047 

6 

0.237817 


0.12466 



* * 




0.117225 

7 

0.324887 


0.17030 



* * * 




0.118105 

8 

0.241111 


0.12639 



* * * 




0.119731 

9 

0.055065 


0.02886 



* 




0.120617 

10 

-0.073198 


- .03837 


* 





0.120663 





Inverse Autocorrelations 






Lag 

Correlation 


198765432101234567 

8 

9 1 




1 


-0.57307 


*********** 







2 


0.22767 



***** 






3 


0.10911 



* * 






4 


-0.09954 


* * 







5 


0.09918 



* * 






6 


-0.06612 


* 







7 


0.03872 



* 






8 


-0.05099 


* 







9 


0.02259 









10 


-0.01998 












Partial Autocorrelations 






Lag 

Correlation 


198765432101234567 

8 

9 1 




1 


0.49043 



********** 






2 


-0.52236 


********** 







3 


-0.09437 


* * 







4 


-0.02632 


* 







5 


-0.09683 


* * 







6 


0.05711 



* 






7 


0.00687 









8 


0.05190 



* 






9 


0.00919 









10 


0.03505 



* 








Autocorrelation Check for White Noise 





To 

Chi- 


Pr > 








Lag 

Square 

DF 

ChiSq 


.Autocorrelations 

- 


" - - - 


6 

84.33 

6 

<.0001 


0.490 -0.156 -0.425 -0.286 


-0 

056 

0.125 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 







The ARIMA Procedure 










Name of Variable = Y5 









Mean 

of 

Working Series -0 

30048 








Standarc 

Deviation 1 .1 

316518 








Number 

)f 

Observations 

150 











Autocorrelations 






Lag 

Covariance 

Correlation 


1 

98765432101234567 

8 

9 1 


Std Error 

0 

1.733219 


1 .00000 




******************** 


0 

1 

0.852275 


0.49173 




********** 




0.081650 

2 

-0.055217 


-.03186 



* 





0.099452 

3 

-0.200380 


- .11561 



* * 





0.099520 

4 

-0.203287 


- .11729 



* * 





0.100411 

5 

-0.144763 


- .08352 



* * 





0.101320 

6 

0.011068 


0.00639 








0.101778 

7 

0.163554 


0.09436 




* * 




0.101781 

8 

0.234861 


0.13551 




* * * 




0.102363 

9 

0.141452 


0.08161 




* * 




0.103551 

10 

0.0013709 


0.00079 








0.103979 





Inverse Autocorrelations 






Lag 

Correlation 


1 

98765432101234567 

8 

9 1 




1 


-0.69306 



************** 







2 


0.44973 




********* 






3 


-0.24954 



***** 







4 


0.13893 




* * * 






5 


-0.04150 



* 







6 


0.00044 










7 


0.01466 










8 


-0.04225 



* 







9 


0.01048 










10 


-0.00262 













Partial Autocorrelations 






Lag 

Correlation 


1 

98765432101234567 

8 

9 1 




1 


0.49173 




********** 






2 


-0.36093 



******* 







3 


0.12615 




* * * 






4 


-0.17086 



* * * 







5 


0.05473 




* 






6 


0.00681 










7 


0.08204 




* * 






8 


0.05577 




* 






9 


-0.01303 










10 


0.00517 












Autocorrelation Check for White Noise 





To 

Chi- 


Pr > 









Lag 

Square 

DF 

ChiSq 


- 

.Autocorrelations 

- 


" “ “ - 


6 

42.48 

6 

<.0001 


0 

.492 -0.032 -0.116 -0.117 


-0 

084 

0.006 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 





The ARIMA Procedure 
Name of Variable = Y6 








Mean 

of 

Working Series -0 

04253 







Standard Deviation 1.143359 







Number of Observations 

150 









Autocorrelations 






Lag 

Covariance 

Correlation 


198765432101234567 

8 

9 1 


Std Error 

0 

1.307271 

1.00000 



******************** 


0 

1 

-0.137276 

- .10501 


* * 





0.081650 

2 

-0.385340 

- .29477 


****** 





0.082545 

3 

-0.118515 

-.09066 


* * 





0.089287 

4 

0.0083104 

0.00636 







0.089899 

5 

-0.084843 

-.06490 


* 





0.089902 

6 

0.011812 

0.00904 







0.090214 

7 

0.045677 

0.03494 



* 




0.090220 

8 

0.119262 

0.09123 



* * 




0.090310 

9 

0.018882 

0.01444 







0.090922 

10 

-0.083572 

-.06393 


* 





0.090937 




Inverse Autocorrelations 






Lag 

Correlation 


198765432101234567 

8 

9 1 




1 

0.54503 



*********** 






2 

0.57372 



*********** 






3 

0.44261 



********* 






4 

0.33790 



******* 






5 

0.28173 



****** 






6 

0.16162 



* * * 






7 

0.10979 



* * 






8 

0.03715 



* 






9 

0.02286 









10 

0.02606 



* 








Partial Autocorrelations 






Lag 

Correlation 


198765432101234567 

8 

9 1 




1 

-0.10501 


* * 







2 

-0.30920 


****** 







3 

-0.18297 


* * * * 







4 

-0.14923 


* * * 







5 

-0.20878 


* * * * 







6 

-0.13688 


* * * 







7 

-0.12493 


* * 







8 

-0.00862 









9 

-0.01255 









10 

-0.04518 


* 








Autocorrelation Check for White Noise 






o 

© 








To 

Chi- 

Pr > 








Lag 

Square 

DF ChiSq 


.Autocorrelations 

- 


- - - 


6 

17.03 

6 0.0092 


0.105 -0.295 -0.091 0.006 


-0 

065 

0.009 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 






The ARIMA Procedure 









Name of Variable = Y7 








Mean 

of 

Working Series -0 

15762 







Standarc 

Deviation 1 .( 

123007 







Number 

)f Observations 

150 









Autocorrelations 






Lag 

Covariance 

Correlation 


198765432101234567 

8 

9 1 


Std Error 

0 

1.046543 

1.00000 



******************** 


0 

1 

0.019680 

0.01880 







0.081650 

2 

-0.012715 

-.01215 







0.081679 

3 

-0.107313 

- .10254 


* * 





0.081691 

4 

-0.012754 

-.01219 







0.082544 

5 

-0.085250 

-.08146 


* * 





0.082556 

6 

0.023489 

0.02244 







0.083090 

7 

0.048176 

0.04603 



* 




0.083131 

8 

0.106544 

0.10181 



* * 




0.083300 

9 

0.033337 

0.03185 



* 




0.084126 

10 

-0.026272 

-.02510 


* 





0.084206 




Inverse Autocorrelations 






Lag 

Correlation 


198765432101234567 

8 

9 1 




1 

-0.00490 









2 

0.02012 









3 

0.08424 



* * 






4 

-0.00793 









5 

0.06817 



* 






6 

-0.02208 









7 

-0.03545 


* 







8 

-0.08197 


* * 







9 

-0.03324 


* 







10 

0.02062 











Partial Autocorrelations 






Lag 

Correlation 


198765432101234567 

8 

9 1 




1 

0.01880 









2 

-0.01251 









3 

-0.10213 


* * 







4 

-0.00867 









5 

-0.08435 


* * 







6 

0.01474 









7 

0.04159 



* 






8 

0.08562 



* * 






9 

0.03364 



* 






10 

-0.02114 










Autocorrelation Check for White Noise 






© 

© 








To 

Chi- 

Pr > 








Lag 

Square 

DF ChiSq 


.Autocorrelations 

- 


" - - - 


6 

2.85 

6 0.8269 


0.019 -0.012 -0.103 -0.012 


-0 

081 

0.022 
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Output 3.5 Using the IDENTIFY Statement for Series 1—8: PROC ARIMA (continued) 





The 

ARIMA Procedure 









Name 

of Variable = Y8 








Mean 

of 

Working Series -0 

57405 







Standard Deviation 1.591833 







Number of 

Observations 

150 










Autocorrelations 






Lag 

Covariance 

Correlation 


1 

987654321012345678 

9 

1 


Std Error 

0 

2.533932 

1.00000 




******************** 


0 

1 

1.848193 

0.72938 




*************** 




0.081650 

2 

0.946216 

0.37342 




******* 




0.117303 

3 

0.352595 

0.13915 




* * * 




0.124976 

4 

0.086093 

0.03398 




* 




0.126005 

5 

0.025473 

0.01005 








0.126066 

6 

0.150883 

0.05955 




* 




0.126071 

7 

0.295444 

0.11659 




* * 




0.126259 

8 

0.359400 

0.14183 




* * * 




0.126975 

9 

0.279258 

0.11021 




* * 




0.128026 

10 

0.142827 

0.05637 




* 




0.128657 




Inverse Autocorrelations 






Lag 

Correlation 


1 

987654321012345678 

9 

1 




1 

-0.65293 



************* 







2 

0.20188 




* * * * 






3 

0.00095 










4 

-0.06691 



* 







5 

0.08986 




* * 






6 

-0.06873 



* 







7 

0.04555 




* 






8 

-0.04728 



* 







9 

0.01718 










10 

0.00064 












Partial Autocorrelations 






Lag 

Correlation 


1 

987654321012345678 

9 

1 




1 

0.72938 




*************** 






2 

-0.33883 



******* 







3 

0.05222 




* 






4 

0.00753 










5 

0.01918 










6 

0.11087 




* * 






7 

0.02107 










8 

0.02566 




* 






9 

-0.03977 



* 







10 

-0.00138 











Autocorrelation Check for White Noise 





To 

Chi- 

Pr > 









Lag 

Square 

DF ChiSq 



.Autocorrelations-- 

- 

- 

" - - - 


6 

106.65 

6 <.0001 


0 

729 0.373 0.139 0.034 


0 

010 

0.060 
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3.4.2 Example: Iron and Steel Export Analysis 

The U.S. iron and steel export yearly series (Fairchild Publications 1981) graphed in Output 3.6 is a 
good illustration of model identification. 


Output 3.6 

Plotting a Yearly 
Series 


IRON AND STEEL EXPORTS EXCLUDING SCRAPS 

WEIGHT IN MILT JON TONS 
1937-1980 


EXPORT 



YEAR 


The following statements produce the results in Output 3.7: 


PROC ARIMA DATA=STEEL; 

IDENTIFY VAR=EXPORT NLAG=10; 
RUN; 
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Although the Q statistic O fails by a slim margin to be significant, the lag 1 autocorrelation 
0.47193 © is beyond the two standard error bands. Thus, you want to fit a model despite the Q value. 
From the ACF, it appears that an MA(1) is appropriate. From the PACF and IACF, an AR(1) also 
appears consistent with these data. You can fit both and select the one with the smallest error mean 
square. To fit the MA(1) model, use the statement 

ESTIMATE Q=1; 

For the AR(1) model use the statement 

ESTIMATE P=1; 


Output 3. 7 Identifying a Model Using the IDENTIFY Statement 




IRON AND STEEL EXPORTS EXCLUDING SCRAPS 





WEIGHT IN MILLION TONS 





1937-1980 





The ARIMA Procedure 





Name of Variable = EXPORT 




Mean 

of Working Series 4.418182 




Standard Deviation 1.73354 

Number of Observations 44 





Autocorrelations 


Lag 

Covariance 

Correlation 

-1 987654321 01 234567891 

Std Error 

0 

3.005160 

1.00000 

| |******************** | 

0 

1 

1.418238 

0.47193 

| Q _ |********* | 

0.150756 

2 

0.313839 

0.10443 

1 1 * * 1 

0.181248 

3 

0.133835 

0.04453 

1 1 * 1 

0.182611 

4 

0.310097 

0.10319 

1 1 * * i 

0.182858 

5 

0.296534 

0.09867 

1 1 * * 1 

0.184176 

6 

0.024517 

0.00816 

! 1 i 

0.185374 

7 

-0.159424 

-.05305 

1 * 1 1 

0.185382 

8 

-0.299770 

-.09975 

1 * * 1 1 

0.185727 

9 

-0.247158 

-.08224 

1 * * 1 1 

0.186940 

10 

-0.256881 

-.08548 

! . ** . 

0.187761 




Inverse Autocorrelations 



Lag 

Correlation 

-1 987654321 01 234567891 



1 

-0.48107 

1 **********1 1 



2 

0.14768 

I 1 * * * I 



3 

-0.01309 

1 1 ! 



4 

-0.03053 

1 * 1 1 



5 

-0.05510 

1 * 1 1 



6 

0.04941 

1 1 * 1 



7 

-0.04857 

1 * 1 1 



8 

0.07991 

1 1 * * 1 



9 

-0.03744 

| * 1 1 



10 

0.04236 

I 1 * 1 
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Output 3.7 Identifying a Model Using the IDENTIFY Statement (continued) 


IRON AND STEEL EXPORTS EXCLUDING SCRAPS 
WEIGHT IN MILLION TONS 
1937-1980 

The ARIMA Procedure 
Partial Autocorrelations 


Lag Correlation -1 987654321 01 234567891 



1 


0.47193 



********* 




2 


-0.15218 


* * * 





3 


0.07846 



* * 




4 


0.08185 



* * 




5 


0.01053 







6 


-0.05594 


* 





7 


-0.03333 


* 





8 


-0.08310 


* * 





9 


-0.01156 







10 


-0.05715 


* 





o 


Autocorrelation 

Check for White Noise 


To 

Chi- 


Pr > 





Lag 

Square 

DF 

ChiSq 

© 

.Autocorrelations-- 


6 

12.15 

6 

0.0586 

0.472 

0.104 0.045 0.103 

0 

099 0.008 


Suppose you overfit, using an MA(2) as an initial step. Specify these statements: 

PROC ARIMA DATA=STEEL; 

IDENTIFY VAR=EXPORT NOPRINT; 

ESTIMATE Q=2; 

RUN; 


Any ESTIMATE statement must be preceded with an IDENTIFY statement. In this example, 
NOPRINT suppresses the printout of ACF, IACF, and PACF. Note that the Q statistics O in Output 
3.8 are quite small, indicating a good fit for the MA(2) model. However, when you examine the 
parameter estimates and their t statistics ©, you see that more parameters were fit than necessary. An 
MA(1) model is appropriate because the t statistic for the lag 2 parameter is only -0.85. Also, it is 
wise to ignore the fact that the previous Q was insignificant due to the large t value, -3.60, associated 
with the lag 1 coefficient. In Output 3.7 the Q was calculated from six autocorrelations ©, and the 
large lag 1 autocorrelation's effect © was diminished by the other five small autocorrelations. 
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Output 3.8 Fitting an MA(2) Model with the ESTIMATE Statement 


IRON AND STEEL EXPORTS EXCLUDING SCRAPS 
WEIGHT IN MILLION TONS 
1937-1980 

The ARIMA Procedure 
Conditional Least Squares Estimation 


Parameter 


Estimate 


Standard Approx 

Error t Value Pr > |t| 


4.43400 

-0.56028 

-0.13242 


0.39137 

0.15542 

0.15535 


<.0001 

0.0008 

0.3990 


Constant Estimate 4.433999 

Variance Estimate 2.433068 

Std Error Estimate 1 .559829 
AIC 166.8821 

SBC 172.2347 

Number of Residuals 44 

* AIC and SBC do not include log determinant. 


Correlations of Parameter Estimates 


Parameter 

MU 

MAI ,1 

MAI ,2 

MU 

1 .000 

-0.013 

-0.011 

MAI ,1 

-0.013 

1 .000 

0.492 

MAI ,2 

-0.011 

0.492 

1 .000 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations- - ■ 



6 

0.58 

4 

0.9653 

-0.002 

-0.006 

0.006 

0.060 

0.081 

-0.032 

12 

2.81 

10 

0.9855 

0.005 

-0.077 

-0.035 

0.008 

-0.163 

0.057 

18 

6.24 

16 

0.9853 

0.066 

-0.005 

0.036 

-0.098 

0.123 

-0.125 

24 

12.10 

22 

0.9553 

-0.207 

-0.086 

-0.102 

-0.068 

0.025 

-0.060 


Model for Variable EXPORT 


Estimated Mean 4.433999 


Moving Average Factors 


Factor 1: 1 + 0.56028 B**(1) + 0.13242 B**(2) 
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You now fit an MA(1) model using these statements: 

PROC ARIMA DATA=STEEL; 

IDENTIFY VAR=EXPORT NOPRINT; 

ESTIMATE Q=1; 

RUN; 

The results are shown in Output 3.9. The Q statistics O are still small, so you have no evidence of a 
lack of fit for the order 1 MA model. The estimated model is now © 

Y t = 4.421 + e t + .4983e M 

Output 3.9 Fitting an MA{ 1 ) Model with the ESTIMATE Statement 


IRON AND STEEL EXPORTS EXCLUDING SCRAPS 
WEIGHT IN MILLION TONS 
1937-1980 

The ARIMA Procedure 
Conditional Least Squares Estimation 




Standard 


Approx 


Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

MU 

4.42102 

0.34703 

12.74 

<.0001 

0 

MAI ,1 

-0.49827 

0.13512 

-3.69 

0.0006 

1 


Constant 

Estimate 

4.421016 




Variance 

Estimate 

2.412583 




Std Error 

Estimate 

1 .553249 




AIC 


165.5704 




SBC 


169.1388 




Number of 

Residuals 

44 




* AIC and SBC do not include log determinant. 

Correlations of Parameter 
Estimates 


Parameter 

MU 

MAI , 1 


MU 

1 .000 
-0.008 


MAI ,1 

-0.008 
1 .000 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations- - ■ 



6 

1 .31 

5 

0.9336 

0.059 

0.094 

-0.028 

0.085 

0.075 

-0.020 

12 

3.23 

11 

0.9873 

-0.006 

-0.079 

-0.052 

-0.013 

-0.146 

0.039 

18 

6.68 

17 

0.9874 

0.063 

-0.001 

0.044 

-0.092 

0.096 

-0.149 

24 

14.00 

23 

0.9268 

-0.206 

-0.135 

-0.114 

-0.084 

0.014 

-0.072 


Model for Variable EXPORT 
Estimated Mean 4.421016 

Moving Average Factors 
Factor 1: 1 + 0.49827 B**(1; 







Chapter 3: The General ARIMA Model 95 


3.4.3 Estimation Methods Used in PROC ARIMA 

How does PROC ARIMA estimate this MA coefficient? As in the AR case, three techniques are 
available: 

□ conditional least squares (CLS) 

□ unconditional least squares (ULS) 

□ maximum likelihood (ML). 

In the CLS method you attempt to minimize 

Z n 2 

t=p +1 e t 

where p is the order of the AR part of the process and e t is a residual. In the example, 

e , = Y ,-p + P(c-i) 

where q and (3 are parameter estimates. Begin by assuming e 0 =0. ARIMA computations indicate 
that q = 4.421 and (3 = -.4983 provide the minimum for the iron export data. 

To illustrate further, suppose you are given data Y p Y„ . . . , Y 6 , where you assume 
% = e t - Pq , 

Suppose you want to estimate p from the data given below: 


Sum of squares 


Y t 

-12 

-3 

7 

9 

4 

-7 


e,(-0.29) 

-12 

0.48 

6.86 

7.01 

1.97 

-7.57 

301.62 

ex-o.30) 

-12 

0.60 

6.82 

6.95 

1.91 

-7.57 

300.13 

W,(-0.30) 

0 

12 

-4 

-6 

-6 

0 



You find that 
y(0) = 57.89 

and 

y(l) = 15.26 
Solving 

p(l) = - P / (l + p 2 )=.2636 
yields the initial estimate p = -.29. Compute 

e t =Y t - 29e t _ x 







96 SAS for Forecasting Time Series 


Stalling with e 0 =0, values of e t are listed under the Y ; values. Thus, 

<?!=%- .29(0) = -12 
e 2 = - 3 - ,29(-12) = .48 
<? 3 = 7 - ,29(.48) = 6.86 
and thus 

e + e o + ... + = 301.62 

Perhaps you can improve upon p = -.29. For example, using p = -.30, you can add a second row of 
e t values to the previous list and thus compute 

e + e o + ... + = 300.13 

The larger (3 gives a smaller sum of squares, so you would like to continue increasing P, but by how 
much? Letting p n be the true value of the parameter, you can use Taylor's series expansion to write 

e t (Po ) = e t (p) - W, (p)(p 0 -p) + R, (3-2) 

where -W is the derivative of e t with respect to p and R ; is a remainder term. Rearranging equation 
3.2 and ignoring the remainder yields 

^(p) = W,(p)(Po -p) + <?,(p 0 ) 

Because e t (p n ) is white noise, this looks like a regression equation that you can use to estimate 
P 0 - p. You need to compute the derivative W f . Derivatives are defined as limits—for example, 

- W, (p) = 5 ™ 0 (e r (|3 + s) - e t (p| / 8 

You have now computed e t {~.29) and <?,(-.30), so you can approximate W f by 
-(^ (--29)-^ (-.30))/.01 

as in the third row of the table above, where 8 = .01 and p = -.30. In PROC ARIMA, 8 = .001 unless 
otherwise specified. Now, regressing e f (-.30) on W f (-.30) gives a coefficient 

((.60)(12)-(6 82)(4)-(6.95)(6)-(1.91)(6)-(7 57)(0)) 

/ ((12) : +(4) : +(6) : +(6) : +(0)) = -0.3157 

This is an estimate of 
Po - p = Po + .30 

so you compute a new estimate of p by 


-.30 - 0.3157 = - .6157 
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This estimate of J3 results in a lower sum of squares, 

S<? 2 (-.6157) = 271.87 

Using (3 = -.6157 as an initial value, you can again compute an improvement. Continue iterating the 
estimation improvement technique until the changes A(3 become small. For this data set, (3 = -.6618 
appears to minimize the sum of squares at 271.153. 

You can extend this method to higher-order and mixed processes. The technique used in PROC 
ARIMA is more sophisticated than the one given here, but it operates under the same principle. The 
METHOD=ULS technique more accurately computes prediction error variances and finite sample 
predictions than METHOD=CLS. METHOD=CLS assumes a constant variance and the same linear 
combination of past values as the optimum prediction. Also, when you specify METHOD=ML, the 
quantity to be minimized is not the sum of squares; instead, it is the negative log of the likelihood 
function. Although CLS, ULS, and ML should give similar results for reasonably large data sets, 
studies comparing the three methods indicate that ML is the most accurate. Initial values are 
computed from the Yule-Walker equations for the first round of the iterative procedure as in the 
example above. See also Section 2.2.1. 


3.4.4 ESTIMATE Statement for Series 8 

Finally, reexamine the generated series Y 8 , 

Y , - -6 Y m =e t + M t _ x 

The following statements produce Output 3.10: 

PROC ARIMA DATA=SERIES; 

IDENTIFY VAR=Y8 NOPRINT; 

ESTIMATE P=1 Q=1 PRINTALL GRID; 

ESTIMATE P=2 Q=2; 

RUN; 

The PRINTALL option shows the iterations. Because the iterations stop when the changes in 
parameter estimates are small, you have no guarantee that the final parameter estimates have 
minimized the residual sum of squares (or maximized the likelihood). To check this, use the GRID 
option to evaluate the sum of squares (or likelihood) on a grid surrounding the final parameter 
estimates. Examine the grids O © © in Output 3.10 and verify that the middle sum of squares, 
164.77, is the smallest of the nine tabulated values. For example, increasing the AR estimate .52459 
to .52959 and decreasing the MA estimate -.32122 to -.32622 increases the sum of squares from 
164.77 to 164.79. A message © associated with the last command indicates that the procedure could 
not find estimates that minimized the error sum of squares because excess lags are specified on both 
sides of the ARMA model. 
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Output 3.10 Using the ESTIMATE Statement for Series 8: PROC ARIMA 


The ARIMA Procedure 

Preliminary Estimation 

Initial Autoregressive 
Estimates 

Estimate 


Initial Moving Average 
Estimates 

Estimate 


Iteration 


Constant Term Estimate -0.17285 

White Noise Variance Est 1.117013 

Conditional Least Squares Estimation 


SSE 

MU 

MAI ,1 

AR1 ,1 

Constant 

Lambda 

R Crit 

165.16 

-0.37242 

-0.27217 

0.53588 

-0.17285 

0.00001 

1 

164.78 

-0.29450 

-0.31415 

0.52661 

-0.13941 

IE-6 

0.04534 

164.77 

-0.28830 

-0.31998 

0.52519 

-0.13689 

IE-7 

0.005519 

164.77 

-0.28762 

-0.32102 

0.52469 

-0.13671 

IE-8 

0.000854 

164.77 

-0.28756 

-0.32122 

0.52459 

-0.13671 

IE-9 

0.000151 


ARIMA Estimation Optimization Summary 


Estimation Method 

Parameters Estimated 

Termination Criteria 

Iteration Stopping Value 

Criteria Value 

Alternate Criteria 

Alternate Criteria Value 

Maximum Absolute Value of Gradient 

R-Square Change from Last Iteration 

Objective Function 

Objective Function Value 

Marquardt's Lambda Coefficient 

Numerical Derivative Perturbation Delta 

Iterations 


Conditional Least Squares 

3 

Maximum Relative Change in Estimates 

0.001 

0.000604 

Relative Change in Objective Function 

4.782E-8 

0.020851 

0.000151 

Sum of Squared Residuals 
164.7716 
IE-9 
0.001 

4 
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Output 3.10 Using the ESTIMATE Statement for Series 8: PROC ARIMA (continued) 



Conditional 

Least Squares 

Estimation 





Standard 


Approx 


Parameter 

Estimate 

Error t 

Value 

Pr > |t| 

Lag 

MU 

-0.28756 

0.23587 

C\J 

0.2247 

0 

MAI ,1 

-0.32122 

0.10810 

-2.97 

0.0035 

1 

AR1 ,1 

0.52459 

0.09729 

5.39 

<.0001 

1 


Constant Estimate -0.13671 

Variance Estimate 1.120895 

Std Error Estimate 1.058724 

AIC 445.7703 

SBC 454.8022 

Number of Residuals 150 

* AIC and SBC do not include log determinant. 


Correlations of Parameter Estimates 


Parameter 

MU 

MAI ,1 

AR1 ,1 

MU 

1 .000 

0.016 

0.054 

MAI ,1 

0.016 

1 .000 

0.690 

AR1 ,1 

0.054 

0.690 

1 .000 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- ■ 



6 

2.08 

4 

0.7217 

-0.006 

0.003 

0.039 

-0.067 

0.082 

-0.022 

12 

3.22 

10 

0.9758 

-0.059 

-0.029 

0.010 

-0.037 

0.034 

-0.015 

18 

5.79 

16 

0.9902 

0.004 

0.106 

0.051 

0.038 

0.002 

0.004 

24 

11.75 

22 

0.9623 

-0.035 

0.099 

-0.050 

0.007 

0.001 

-0.140 

30 

13.19 

28 

0.9920 

-0.043 

-0.059 

0.004 

0.034 

-0.011 

-0.034 


SSE Surface on Grid Near 
Estimates: MAI,1 (y8) 

MU (y8) -0.32622 -0.32122 -0.31622 

-0.29256 164.78 164.77 164.78 

-0.28756 164.78 164.77 164.78 

-0.28256 164.78 164.77 164.78 O 
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Output 3.10 Using the ESTIMATE Statement for Series 8: PROC ARIMA (continued) 


SSE Surface on Grid Near 
Estimates: AR1,1 (y8) 


MU (y8) 

-0.29256 

-0.28756 

-0.28256 


0.51959 0.52459 0.52959 

164.78 164.77 164.78 

164.78 164.77 164.78 

164.78 164.77 164.78 


SSE Surface on Grid Near 
Estimates: AR1,1 (y8) 

MAI , 1 (y8) 0.51959 0.52459 0.52959 

-0.32622 164.77 164.78 164.79 

-0.32122 164.78 164.77 164.78 © 

-0.31622 164.79 164.78 164.77 


Model for Variable y8 


Estimated Mean -0.28756 


Autoregressive Factors 


Factor 1: 1 - 0.52459 B**(1; 


Moving Average Factors 
Factor 1: 1 + 0.32122 B**(1) 

WARNING: The model defined by the new estimates is unstable. The iteration process has been 
terminated. 


WARNING: Estimates may not have converged. © 


ARIMA Estimation Optimization Summary 


Estimation Method 
Parameters Estimated 
Termination Criteria 
Iteration Stopping Value 
Criteria Value 

Maximum Absolute Value of Gradient 
R-Square Change from Last Iteration 
Objective Function 
Objective Function Value 
Marquardt's Lambda Coefficient 
Numerical Derivative Perturbation Delta 
Iterations 
Warning Message 


Conditional Least Squares 

5 

Maximum Relative Change in Estimates 

0.001 
1.627249 
144.3651 
0.118066 

Sum of Squared Residuals 
164.6437 
0.00001 
0.001 
20 

Estimates may not have converged. 
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Output 3.10 Using the ESTIMATE Statement for Series 8: PROC ARIMA (continued) 



Conditional 

Least Squares 

Estimation 





Standard 


Approx 


Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

MU 

-0.30535 

0.43056 

-0.71 

0.4793 

0 

MAI ,1 

0.68031 

0.36968 

1 .84 

0.0678 

1 

MAI ,2 

0.31969 

0.17448 

1 .83 

0.0690 

2 

AR1 ,1 

1.52519 

0.35769 

4.26 

<.0001 

1 

AR1 ,2 

-0.52563 

0.18893 

-2.78 

0.0061 

2 


Constant 

Estimate 

-0.00013 




Variance 

Estimate 

1 .135473 




Std Error 

Estimate 

1.065586 




AIC 


449.6538 




SBC 


464.707 




Number of 

Residuals 

150 




* AIC and SBC do not include log determinant. 


Correlations of Parameter Estimates 


Parameter 

MU 

MAI ,1 

MAI ,2 

AR1 ,1 

AR1 ,2 

MU 

1 .000 

-0.258 

-0.155 

-0.230 

0.237 

MAI ,1 

-0.258 

1 .000 

0.561 

0.974 

-0.923 

MAI ,2 

-0.155 

0.561 

1 .000 

0.633 

-0.442 

AR1 ,1 

-0.230 

0.974 

0.633 

1 .000 

-0.963 

AR1 ,2 

0.237 

-0.923 

-0.442 

-0.963 

1 .000 


To 

Lag 

Chi- 

Square 

DF 

Autocorrelation 

Pr > 

ChiSq .. 

Check of 

Residuals 

—Autocorrelations--■ 



6 

2.07 

2 

0.3549 

-0.007 

0.001 

0.038 

-0.068 

0.082 

-0.022 

12 

3.22 

8 

0.9198 

-0.059 

-0.029 

0.010 

-0.037 

0.034 

-0.016 

18 

5.78 

14 

0.9716 

0.004 

0.106 

0.051 

0.037 

0.002 

0.003 

24 

11.76 

20 

0.9242 

-0.035 

0.099 

-0.050 

0.007 

0.002 

-0.141 

30 

13.20 

26 

0.9821 

-0.043 

-0.059 

0.004 

0.034 

-0.011 

-0.035 


Model for Variable y8 
Estimated Mean -0.30535 

Autoregressive Factors 

Factor 1: 1 - 1.52519 B**(1) + 0.52563 B**(2) 

Moving Average Factors 

Factor 1: 1 - 0.68031B**(1) - 0.31969 B**(2) 







102 SAS for Forecasting Time Series 


To understand the failure to converge, note that 
Y , - - 6 Y m =e t + Ae t _ x 
implies that 

Y m - - 6 Y t _ 2 = e t _ x + Ae t _ 2 

Now multiply this last equation on both sides by cp and add to the first equation, obtaining 
X + (q>- -6)Y,_! - - 6 cpY r 2 = e t + (cp+ A)e t _ x + Atpe t _ 2 

Every cp yields a different ARMA(2,2), each equivalent to the original Y 8 . Thus, the procedure 
could not find one ARMA(2,2) model that seemed best. Although you sometimes overfit and test 
coefficients for significance to select a model (as illustrated with the iron and steel data), the example 
above shows that this method fails when you overfit on both sides of the ARMA equation at once. 
Notice that (1 - 1.525B + ,525B 2 )(ry - p) = (1 - .6803B + ,3197)e, is the same as 
(1 - B)(l - ,525B)( y t - p) = (1 - B)(l + .3197B)e r or, eliminating the common factor, 

(1 - ,525B)( v r - p) = (1 + .3197B)e r 


3.4.5 Nonstationary Series 

The theory behind PROC ARIMA requires that a series be stationary. Theoretically, the stationarity 
of a series 

(l - otjB - a 2 B 2 - ...-a^B^Y, - p) 

= (l-P 1 B-p 2 B 2 - ...-p g B«)?, 
hinges on the solutions M of the characteristic equation 
1 - oqM - a 2 M 2 - ... - = 0 

If all Ms that satisfy this equation have |M| > 1 , the series is stationary. For example, the series 
(l - 1.5B + .64B 2 )(Y, - p) = (l + , 8 B)e, 
is stationary, but the following series is not: 

(l - 1.5B + .5B 2 )(Y r - p) = (l + .SB)e t 
The characteristic equation for the nonstationary example above is 
1-1.5M + .5M 2 =0 

with solutions M=1 and M=2. These solutions are called roots of the characteristic polynomial, and 
because one of them is 1 the series is nonstationary. This unit root nonstationarity has several 
implications, which are explored below. The overfit example at the end of the previous section ended 
when the common factor (1 - cpB) neared (1 - B), an “unstable” value. 
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First, expanding the model gives 

Y, - 1.5Y m + -5Y t _ 2 + (l - 1.5 + ,5)p =e t + . 8 ,_, 

which shows that p drops out of the equation. As a result, series forecasts do not tend to return to the 
historic series mean. This is in contrast to stationary series, where p is estimated and where forecasts 
always approach this estimated mean. 

In the nonstationary example, Y ; is the series level and 
W, = Y, - Y, , 

is the first difference or change in the series. By substitution, 

W,-.5W M = 

so when the levels Y ; satisfy an equation with a single unit root nonstationarity, the first differences 
W ; satisfy a stationary equation, often with mean 0. Similarly, you can eliminate a double unit root as 
in 

(l-2B+B 2 )(Y r -p) = ^+.8 Vl 
by computing and then analyzing the second difference 

W,-W,_,=Y,-2Y,_ 1+ Y,_ 2 

The first and second differences are often written VY t and V 2 Y ( . For nonseasonal data, you rarely 
difference more than twice. 

Because you do not know the model, how do you know when to difference? You decide by 
examining the ACF or performing a test as in Section 3.4.8. If the ACF dies off very slowly, a unit 
root is indicated. The slow dying off may occur after one or two substantial drops in the ACF. Note 
that the sequence 1, .50, .48, .49, .45, .51, .47, ... is considered to die off slowly in this context even 
though the initial drop from 1 to .5 is large and the magnitude of the autocorrelation is not near 1. 

Using the IDENTIFY statement, you can accomplish differencing easily. The statement 
IDENTIFY VAR=Y(1); 

produces the correlation function for W, where 
W, = Y, - Y, , 

A subsequent ESTIMATE statement operates on W f , so the NOCONSTANT option is normally 
used. The statement 

IDENTIFY VAR=Y(1,1); 
specifies analysis of the second difference, or 

(y,-y m )-(y m -y,_ 2 ) 

The default is no differencing for the variables. Assuming a nonzero mean in the differenced data is 
equivalent to assuming a deterministic trend in the original data because (a + (3 1) - (a + (3 (t - 1)) = (3. 
You can fit this |3 easily by omitting the NOCONSTANT option. 
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3.4.6 Effect of Differencing on Forecasts 

PROC ARIMA provides forecasts and 95% upper and lower confidence bounds for predictions for 
the general ARIMA model. If you specify differencing, modeling is done on the differenced series, 
but predictions are given for the original series levels. Also, when you specify a model with 
differencing, prediction error variances increase without bound as you predict further into the future. 

In general, by using estimated parameters and by estimating a 2 from the model residuals, you can 
easily derive the forecasts and their variances from the model. PROC ARIMA accomplishes this task 
for you automatically. 

For example, in the model 

Y,-1.5Y,_ 1+ .5Y,_ 2 =e, 

note that 

(Y,-Y M )-.5(Y M -Y,_,) = e, 

Thus, the first differences 
W, = Y, - Y, , 

are stationary. Given data Y p Y,, . . ., Y from this series, you predict future values by first 
predicting future values of W , using .5 j W n as the prediction. Now 

Y -Y=W,+Wo+ +W ■ 

± n+j ± n vv «+l ' vv «+2 ~***~ yy n+j 

so the forecast of Y is 

n+j 

Y„ + 2/ =1 (.5)'W„ 

To illustrate further, the following computation of forecasts shows a few values of Y, W, and 
predictions Y : 


Actual 

Forecast 

t 

98 

99 

100 (n) 

101 

102 

103 

104 

Y f 

475 

518 

550 

566 

574 

578 

580 

W f 

28 

43 

32 

16 

8 

4 

2 


Note that 

2/ =1 C 5 )' 

approaches 1 as j increases, so the forecasts converge to 
550 + (1)(32) - 582 
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Forecast errors can be computed from the forecast errors of the Ws—for example, 

y„ + 2=y„+w„ +1 +w„ +2 

and 

Y„ + 2=Y„+.5W„+.25W„ 

Rewriting 

Y„ +2 = Y„ + (,5W„ + e n+l ) + (,25W„ + ,5c„ +1 + e n+2 ) 
yields the forecast error 


with the variance 3.25cr. 


3.4.7 Examples: Forecasting IBM Series and 
Silver Series 

An example that obviously needs differencing is the IBM stock price series reported by Box and 
Jenkins (1976). In this example, the data are analyzed with PROC ARIMA and are forecast 15 
periods ahead. Box and Jenkins report values of daily closing prices of IBM stock. You read in the 
series and check the ACF: 

DATA IBM; 

INPUT PRICE @@; 

T+l ; 

CARDS; 
data lines 
/ 

RUN; 

PROC ARIMA DATA=IBM; 

IDENTIFY VAR=PRICE CENTER NLAG=15; 

IDENTIFY VAR=PRICE(1) NLAG=15; 

RUN; 


The plot of the original data is shown in Output 3.11, and the IDENTIFY results in Output 3.12. 
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Output 3.11 

Plotting the 


SERIES B 

Original 

IBM DAILY STOCK PRICES 17MAY61 TO 2NOV62 

Data 

PRICE 
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Output 3.12 Identifying the IBM Price Series 





SERIES B 




IBM DAILY STOCK PRICES 17MAY61 TO 2N0V62 





The ARIMA Procedure 





Name of Variable = PRICE 




Mean 

of Working Series 0 




Standard Deviation 84.10504 

Number of Observations 369 





Autocorrelations 





O 


Lag 

Covariance 

Correlation 

-1 987654321 01 234567891 

Std Error 

0 

7073.658 

1.00000 

1 1 ******************** I 

0 

1 

7026.966 

0.99340 

1 1 ******************** 1 

0.052058 

2 

6973.914 

0.98590 

1 1 ******************** 1 

0.089771 

3 

6918.629 

0.97808 

1 1 ******************** 1 

0.115443 

4 

6868.433 

0.97099 

1 I ******************* 1 

0.136059 

5 

6817.810 

0.96383 

| I ******************* 1 

0.153695 

6 

6763.587 

0.95617 

1 1 ******************* I 

0.169285 

7 

6705.771 

0.94799 

1 1 ******************* I 

0.183337 

8 

6645.401 

0.93946 

1 1 ******************* 1 

0.196172 

9 

6580.448 

0.93028 

1 1 ******************* 1 

0.208008 

10 

6522.985 

0.92215 

1 1 ****************** 1 

0.218993 

11 

6466.010 

0.91410 

| 1 ****************** 1 

0.229274 

12 

6407.497 

0.90583 

1 1 ****************** 1 

0.238947 

13 

6348.092 

0.89743 

1 1 ****************** 1 

0.248078 

14 

6289.664 

0.88917 

1 1 ****************** 1 

0.256726 

15 

6230.941 

0.88087 

1 1 ****************** ] 

0.264940 



" 

" marks two standard errors 





Inverse Autocorrelations 



Lag 

Correlation 

-1 987654321 01 234567891 



1 

-0.51704 

1 **********1 1 

1 1 1 



2 

-0.02791 

1 * 1 1 



3 

0.07838 

1 1 * * 1 



4 

-0.01677 

I ■ 1 ■ i 



5 

-0.03290 

1 * 1 1 



6 

0.02005 

! ■ 1 ■ ! 



7 

0.01682 

1 ■ 1 ■ 1 



8 

-0.09039 

1 * * 1 1 



9 

0.10983 

1 1 * * 1 



10 

-0.02725 

1 * 1 1 

1 ■ 1 1 



11 

-0.02739 

1 * 1 1 



12 

0.00859 

! ■ 1 ■ ! 



13 

0.01911 

1 ■ 1 ■ 1 



14 

-0.01985 

1 ■ 1 ■ 1 



15 

0.00673 

1 ■ 1 ■ 1 
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Output 3.12 Identifying the IBM Price Series (continued) 






SERIES B 






IBM DAILY 

STOCK PRICES 17MAY61 

TO 2N0V62 







The ARIMA Procedure 







Partial Autocorrelations 








© 




Lag 

Correlation 


1 987654321 01 234567891 




1 

0.99340 



******************** 




2 

-0.07164 


* 





3 

-0.02325 







4 

0.05396 



* 




5 

-0.01535 







6 

-0.04422 


* 





7 

-0.03394 


* 





8 

-0.02613 


* 





9 

-0.05364 


* 





10 

0.08124 



* * 




11 

-0.00845 







12 

-0.02830 


* 





13 

0.00187 







14 

0.01204 







15 

-0.01469 








Autocorrelation Check for White Noise 



To 

Chi- 

Pr > 






Lag 

Square 

DF ChiSq 


.Autocorrelations. 

" “ “ " 


6 

2135.31 

6 <.0001 


0.993 0.986 0.978 0.971 0 

964 

0.956 

12 

4097.40 

12 <.0001 


0.948 0.939 0.930 0.922 0 

914 

0.906 




Name of Variable = PRICE 






Period(s) of 

Differencing 

1 





Mean of Working 

Series 

-0.27989 





Standard Deviation 

7.248345 





Number of Observations 

368 





Observation^ 

) eliminated by differencing 1 







Autocorrelations 









© 



Lag 

Covariance 

Correlation 


1 987654321 01 234567891 


Std Error 

0 

52.538509 

1 .00000 



******************** 


0 

1 

4.496014 

© 0.08558 



* * 


0.052129 

2 

-0.072894 

-.00139 





0.052509 

3 

-2.853759 

- .05432 


* 



0.052509 

4 

-1.820817 

- .03466 


* 



0.052662 

5 

-1 .261461 

-.02401 





0.052723 

6 

6.350064 

0.12086 



* * 


0.052753 
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Output 3.12 Identifying the IBM Price Series (continued) 


7 

3.585725 

0.06825 

1 

■ 1 * ■ 








1 

0.053500 

8 

1.871606 

0.03562 

1 

■ * ■ 








1 

0.053736 

9 

-3.483286 

-.06630 

1 

.* . 








1 

0.053801 

10 

1 .149218 

0.02187 

1 









1 

0.054022 

11 

4.043788 

0.07697 

1 

1 * * 

■ 1 








1 

0.054046 

12 

2.816399 

0.05361 

1 

■ 1 * ■ 








1 

0.054343 

13 

-2.508704 

-.04775 

1 

.* . 








1 

0.054487 

14 

3.445101 

0.06557 

1 

■ * ■ 








1 

0.054600 

15 

-3.470001 

-.06605 

1 

.* . 








1 

0.054814 



" 

" marks two standard errors 













Inverse 

Autocorrelations 











Lag 

Correlation 

-19 8 

7654321012 

3 

4 

5 

6 

7 

8 

9 

1 



1 

-0.08768 

1 

* * 1 








1 



2 

-0.01236 

1 









1 



3 

0.02663 

1 

■ * ■ 








1 



4 

0.04032 

1 

■ * ■ 








1 



5 

0.04148 

1 

■ * ■ 








1 



6 

-0.10091 

1 

* * 1 








1 



7 

-0.05960 

1 

.* . 








1 



8 

-0.02455 

1 









1 



9 

0.05083 

1 

■ * ■ 








1 



10 

-0.02460 

1 









1 



11 

-0.07939 

1 

* * 1 








1 



12 

-0.03140 

1 

.* . 








1 



13 

0.07809 

1 

1 * * 








1 



14 

-0.07742 

1 

* * 1 








1 



15 

0.05362 

1 

■ * ■ 








1 





Partial 

Autocorrelations 











Lag 

Correlation 

-19 8 

7654321012 

3 

4 

5 

6 

7 

8 

9 

1 



1 

0.08558 

1 

1 * * 








1 



2 

-0.00877 

1 

■ 1 ■ 








1 



3 

-0.05385 

1 

* 1 








1 



4 

-0.02565 

1 

* 1 








1 



5 

-0.01940 

1 

■ 1 ■ 








1 



6 

0.12291 

1 

1 * * 








1 



7 

0.04555 

1 

1 * 








1 



8 

0.02375 

1 

■ 1 ■ 








1 



9 

-0.06241 

1 

* 1 








1 



10 

0.04501 

1 

1 * 








1 



11 

0.08667 

1 

1 * * 








1 



12 

0.02638 

1 

1 * 








1 



13 

-0.07034 

1 

* 1 








1 



14 

0.07191 

1 

1 * 








1 



15 

-0.05660 

1 

* 1 








1 
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Output 3.12 Identifying the IBM Price Series (continued) 





Autocorrelation 

Check for White Noise 




© 







To 

Chi- 


Pr > 





Lag 

Square 

DF 

ChiSq 


.Autocorrelations-- ■ 


o 

6 

9.98 

6 

0.1256 

0.086 

-0.001 -0.054 -0.035 

-0.024 

0.121 

12 

17.42 

12 

0.1344 

0.068 

0.036 -0.066 0.022 

0.077 

0.054 


The ACF O dies off very slowly. The PACF © indicates a very high coefficient, 0.99340, in the 
regression of Y ; on Y M . The ACF of the differenced series © looks like white noise. In fact, the Q 
statistics 9.98 and 17.42 © are not significant. For example, the probability of a value larger than 
9.98 in a x g distribution is .126, so 9.98 is to the left of the critical value and, therefore, is not 
significant. The Q statistics are computed with the first six (9.98) and first twelve (17.42) 
autocorrelations of the differenced series. With a first difference, it is common to find an indication 
of a lag 1 MA term. The first autocorrelation is 0.08558 © with a standard error of about 
l/(368) 1/2 =.052. 

Next, suppress the printout with the IDENTIFY statement (you have already looked at it but still 
want PROC ARIMA to compute initial estimates) and estimate the model: 

PROC ARIMA DATA=IBM; 

IDENTIFY VAR=PRICE(1) NOPRINT; 

ESTIMATE Q=1 NOCONSTANT; 

RUN; 


The results are shown in Output 3.13. 


Output 3.13 

Analyzing 
Daily Series 
with the 
ESTIMATE 
Statement: 
PROC ARIMA 


SERIES B 

IBM DAILY STOCK PRICES 17MAY61 TO 2N0V62 
The ARIMA Procedure 


Conditional Least Squares Estimation 




Standard © 


Approx 

Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

MAI ,1 

-0.08658 

0.05203 

CD 

CD 

0.0970 


Variance 

Estimate 

52.36132 



Std Error 

Estimate 

7.236112 



AIC 


2501.943 



SBC 


2505.851 
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Output 3.13 

Analyzing 
Daily Series 
with the 
ESTIMATE 
Statement: 
PROC ARIMA 
(continued) 


Number of Residuals 368 

* AIC and SBC do not include log determinant. 


Autocorrelation Check of Residuals 


To 

Chi- 


Pr > 











Lag 

Square 

DF 

ChiSq 





- --Autocorrelations- 



— 

. 

6 

6. 

.99 

5 

0.2217 

0 . 

.001 

0 . 

.005 

-0.051 

-0.026 

- 0 . 

.030 

0 . 

.120 

12 

13. 

.94 

11 

0.2365 

0 . 

.056 

0 . 

.039 

-0.070 

0.024 

0 . 

.072 

0 . 

.054 

© 18 

31 

.04 

17 

0.0198 

-0 

.057 

0 

.079 

-0.081 

0.118 

0 

.113 

0 

.040 

24 

39. 

.05 

23 

0.0196 

0 . 

.041 

0 . 

.072 

-0.089 

-0.027 

0 . 

.066 

0 . 

.025 

30 

49. 

.83 

29 

0.0094 

0 . 

.028 

- 0 . 

.100 

-0.055 

0.051 

0 . 

.028 

0 . 

.099 

36 

56. 

.47 

35 

0.0122 

0 . 

.072 

- 0 . 

.074 

-0.063 

-0.007 

0 . 

.022 

0 . 

.035 

42 

64. 

.42 

41 

0.0112 

0 . 

.066 

- 0 . 

.085 

0.059 

-0.060 

0 . 

.018 

0 . 

.017 

48 

76. 

.33 

47 

0.0044 

- 0 . 

.116 

- 0 . 

.037 

0.073 

0.005 

0 . 

.069 

0 . 

.057 


Model for Variable PRICE 
Period(s) of Differencing 1 
No mean term in this model. 

Moving Average Factors 
Factor 1: 1 + 0.08658 B**(1) 


Although the evidence is not strong enough to indicate that the series has a nonzero first-order 
autocorrelation, you nevertheless fit the MA(1) model. The t statistic -1.66 © is significant at the 
10 % level. 

More attention should be paid to the lower-order and seasonal autocorrelations than to the others. In 
this example, you ignore an autocorrelation 0.121 © at lag 6 that was even bigger than the lag 1 
autocorrelation. Similarly, residuals from the final fitted model show a Q statistic 31.04 © that 
attains significance because of autocorrelations .118 and .113 at lags 16 and 17. Ignore this 
significance in favor of the more parsimonious MA(1) model. 


The model appears to fit; therefore, make a third run to forecast: 

PROC ARIMA DATA=IBM; 

IDENTIFY VAR=PRICE(1) NOPRINT; 

ESTIMATE Q=1 NOCONSTANT NOPRINT; 

FORECAST LEAD=15; 

RUN; 


See the forecasts in Output 3.14. 
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Output 3.14 

Forecasting Daily 
Series: PROC 
ARIMA 


SERIES B 

IBM DAILY STOCK PRICES 17MAY61 TO 2N0V62 
The ARIMA Procedure 
Model for variable PRICE 
Period(s) of Differencing 1 
No mean term in this model. 

Moving Average Factors 
Factor 1: 1 + 0.08658 B**(1) 

Forecasts for variable PRICE 


Obs 

Forecast 

Std Error 

95% Confidence 

Limits 

370 

357.3837 

7.2361 

343.2012 

371.5662 

371 

357.3837 

10.6856 

336.4403 

378.3270 

372 

357.3837 

13.2666 

331.3817 

383.3857 

[more output lines] 




382 

357.3837 

28.1816 

302.1487 

412.6187 

383 

357.3837 

29.2579 

300.0392 

414.7282 

384 

357.3837 

30.2960 

298.0047 

416.7627 


If 

Y r -Y M =^-p Vl 

as in the IBM example, then by repeated back substitution 
= fr - Y M ) + P(Y M - Y,_ 2 ) + p 2 (Y,_ 2 - Y,_3) + ... 
or 

Y, = <?, + (i - P)[Y,_j + PY,_ 2 + P 2 Y,_ 3 +...] 

so that 

t=(l-P)(Y M +PY r _ 2 +p 2 Y r _3+...) 

Forecasting Y f by such an exponentially weighted sum of past Ys is called single exponential 
smoothing. Higher degrees of differencing plus the inclusion of more MA terms is equivalent to 
higher-order exponential smoothing. PROC ARIMA, however, unlike PROC FORECAST with 
METHOD=EXPO, estimates the parameters from the data. 

Dickey and Fuller (1979) give a formal test of the null hypothesis that an AR series has a unit 
root nonstationarity versus the alternative that it is stationary. Said and Dickey (1984) extend 
the test to ARIMA models. The test involves a regression of 

VY ( (where VY ( = Y t — Y ( ,) on Y ( , - Y, VY ( ,. . . . , VY ( p where p is at least as large as the 

order of the AR process or, in the case of the mixed process, is large enough to give a good 
approximation to the model. The t test on Y, , - Y is called t L , because it does not have 
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a Student's t distribution and must be compared to tables provided by Fuller (1996, p. 642). The 
silver series from Chapter 2, “Simple Models: Autoregression,” is used as an illustration in the next 
section. 


3.4.8 Models for Nonstationary Data 

You can formally test for unit root nonstationarity with careful modeling and special distributions. 
Any autoregressive model like the AR(2) model Y ( - p = a, (Y ( , - p) + a 2 (Y ( 2 - p) + e t can be 

written in terms of differences and the lagged level term (Y f l - p ). With a little algebra, the AR(2) 
becomes 

Y,-Y m =-(l- ai -«,)(¥,_, -p)-a 2 (Y M ~Y t _ 2 ) + e t 

Stationary depends on the roots of the characteristic equation 1 - a,M - a 2 M 2 = 0, so if M = 1 is a 
root, then (1 - op - a 2 ) = 0. So the (Y ( , - p) term drops out of the model and forecasts do not revert 
to the mean. This discussion suggests a least squares regression of 

Y t - Y ( ! on Y ( ! and (Y ( , - Y ( 2 ) with an intercept and the use of the resulting coefficient or t test 
on the Y ( , term as a test of the null hypothesis that the series has a unit root nonstationarity. If all 
roots M exceed 1 in magnitude, the coefficient of (Y M - p) will be negative, suggesting a one-tailed 
test to the left if stationary is the alternative. There is, however, one major problem with this idea: 
neither the estimated coefficient of (Y M - p) nor its t test has a standard distribution, even when the 
sample size becomes very large. This does not mean the test cannot be done, but it does require the 
tabulation of a new distribution for the test statistics. 

Dickey and Fuller (1979, 1981) studied the distributions of estimators and t statistics in 
autoregressive models with unit roots. The leftmost column of the following tables shows the 
regressions they studied. Here Y t - Y t l = VY ( denotes a first difference. 


Regress VY ( on these: 

AR(1) in deviations form 

Y^VY^-VY,^ 

Y, = P Y,_ 1+ e, 

Y,_i, 1, VY,_j • • • YY t k 

4< 

i 

ii 

-o 

i 

+ 

Y m ,UVY m --VY^ 

Y, - a - Pi = p(Y M - a - P(i - 1 )) + e t 


AR(1) in regression form 

H 0 : P = 1 

vy, = (p 1)Y, , + p 

YY t = e t 

VY r = (p - l)p + (p - 1)Y M + e t 

YY t = e t 

YY t = (1 - p)(a + PO + P + (P - 1)Y m + 

V Y f = P + 
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The lagged differences are referred to as “augmenting lags” and the tests as “Augmented Dickey- 
Fuller” or “ADF” tests. The three regression models allow for three kinds of trends. For illustration a 
lag 1 autoregressive model with autoregressive parameter p is shown in the preceding table both in 

deviations form and in the algebraically equivalent regression form. The deviations form is most 
instructive. It shows that if |p| < 1 and if we have appropriate stalling values, then the expected value 
of Y t is 0, p, or a + p t depending on which model is assumed. Fit the first model only if you know 
the mean of your data is 0 (for example, Y t might already be a difference of some observed 
variable). Use the third model if you suspect a regular trend up or down in your data. If you fit the 
third model when p is really 0, your tests will be valid, but not as powerful as those from the second 

model. The parameter p represents a trend slope when |p| < 1 and is called a “drift” when p = 1. 

Note that for known parameters and n data points, the forecast of Y b+l would be 

a + P(i 7 + L) + p L (Y ; - a - p n) for |p| < 1 with forecast error variance (1 + p 2 H-b p 2 L_ 2 )cr. As L 

increases, the forecast error variance approaches c 2 /(I - p 2 ), the variance of Y around the trend. 
However, if p = 1 the L step ahead forecast is Y n +PL with forecast error variance La 2 , so that the 
error variance increases without bound in this case. In both cases, the forecasts have a component 
that increases at the linear rate p. 

For the regression under discussion, the distributions for the coefficients of Y ( , , 1, and t are all 
nonstandard. Tables of critical values and discussion of the theory are given in Fuller (1996). One 
very nice feature of these regressions is that the coefficients of the lagged differences VY ( ; have 

normal distributions in the limit. Thus a standard F test to see if a set of these lagged differences can 
be omitted is justified in large samples, as are the t statistics for the individual lagged difference 
coefficients. They converge to standard normal distributions. The coefficients of Y M and the 
associated t tests have distributions that differ among the three regressions and are nonstandard. 
Fortunately, however, the t test statistics have the same limit distributions no matter how many 
augmenting lags are used. 

As an example, stocks of silver on the New York Commodities Exchange were analyzed in Chapter 2 
of this book. We reanalyze the data here using DEL to denote the difference, DELi for its ith lag, and 
LSILVER for the lagged level of silver. The WHERE PART= I; statement restricts analysis to the 
data used in the first edition. 

PROC REG DATA= SILVER; 

MODEL DEL=LSILVER DELI DEL2 DEL3 DEL4 /NOPRINT; 

TEST DEL2=0, DEL3=0, DEL4=0; 

WHERE PART= 1 ; 

RUN; 

PROC REG DATA= SILVER; 

MODEL DEL=LSILVER DELI; 

WHERE PART= 1 ; 

RUN; 

Some output follows. First you have the result of the test statement for the model with four 
augmenting lags in Output 3.15. 
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Output 3.15 

Test of 

Augmenting 

Lags 


Test 1 

Results 

for Dependent 

Variable DEL 



Mean 


Source 

DF 

Square 

F Value Pr > F 

Numerator 

3 

1152.19711 

1.32 0.2803 

Denominator 

41 

871 .51780 



Because this test involves only the lagged differences, the F distribution is justified in large samples. 
Although the sample size here is not particularly large, the p-value 0.2803 is not even close to 0.05, 
thus providing no evidence against leaving out all but the first augmenting lag. The second PROC 
REG produces Output 3.16. 


Output 3.16 

PROC REG on 
Silver Data 




Parameter 

Estimates 





Parameter 

Standard 



Variable 

DF 

Estimate 

Error 

t Value 

Pr > |t| 

Intercept 

1 

75.58073 

27.36395 

CO 

I s - 

C\i 

0.0082 

LSILVER 

1 

-0.11703 

0.04216 

-2.78 

0.0079 

DELI 

1 

0.67115 

0.10806 

6.21 

<.0001 


Because the printed p-value 0.0079 is less than 0.05, the uninformed user might conclude that there is 
strong evidence against a unit root in favor of stationarity. This is an error because all p-values from 
PROC REG are computed from the t distribution whereas, under the null hypothesis of a unit root, 
this statistic has the distribution tabulated by Dickey and Fuller. The appropriate 5% left tail critical 
value of the limit distribution is -2.86 (Fuller 1996, p. 642), so the statistic is not far enough below 0 
to reject the unit root null hypothesis. Nonstationarity cannot be rejected. This test is also available in 
PROC ARIMA stalling with Version 6 and can be obtained as follows 

PROC ARIMA DATA= SILVER; 

I VAR = SILVER STATIONARITY=(ADF=(1)) OUTCOV=ADF; 

RUN; 

Output 3.17 contains several tests. 


Output 3.17 

Unit Root 
Tests, Silver- 
Data 



Augmented Dickey-Fuller 

Unit Root Tests 



Type 

Lags 

Rho Pr < Rho 

Tau Pr < Tau 

F 

Pr > F 

Zero Mean 

1 

-0.2461 0.6232 

-0.28 0.5800 



Single Mean 

1 

-17.7945 0.0121 

-2.78 0.0689 

3.86 

0.1197 

T rend 

1 

-15.1102 0.1383 

-2.63 0.2697 

4.29 

0.3484 




116 SAS for Forecasting Time Series 


Every observed data point exceeds 400, so any test from a model that assumes a 0 mean can be 
ignored. Also, the PROC REG output strongly indicated that one lagged difference was required. 
Thus the tests with no lagged differences can also be ignored and are not requested here. The output 
shows coefficient (or “normalized bias”) unit root tests that would be computed as n (p -1) in an 
AR(1) model with coefficient p. For the AR(2) model with roots p and m, the regression model 
form 


Y,-Y m =-(l- ai -«,)(¥,_, -n)-a 2 (Y M ~Y t _ 2 ) + e t 

becomes 

Y,-Y m =-(l- m )(l-p)(Y M -p) + «7p(Y w Y .) • c, 

so that the coefficient of Y ( , is -(1 - p)(l - m) in terms of the roots. If p = 1, it is seen that the 
coefficient of (Y M - Y ( 2 ), 0.671152 in the silver example, is an estimate of m, so it is not surprising 
that an adjustment using that statistic is required to get a test statistic that behaves like n (p -1) under 
H 0 :p = l. Specifically you divide the lag 1 coefficient (-0.117034) by (1 -.671152), then multiply 
by n. Similar adjustments can be made in higher-order processes. For the silver data. 

50(—0.117034)/(I - .671152) = -17.7945 is shown in the printout and has a p-value (.0121) less than 
0.05. However, based on simulated size and power results (Dickey 1984), the tau tests are preferable 
to these normalized bias tests. Furthermore, the adjustment for lagged differences is motivated by 
large sample theory and n = 50 is not particularly large. The associated tau test, -2.78 , has a p-value 
exceeding 0.05 and hence fails to provide significant evidence at the usual 0.05 level against the unit 
root null hypothesis. The F type statistics are discussed in Dickey and Fuller (1981). If interest lies 
only in inference about p, there is no advantage to using the F statistics, which include restrictions 

on the intercept and trend as a part of H 0 . Simulations indicate that the polynomial deterministic 
trend should have as low a degree as is consistent with the data, in order to get good power. The 50 
observations studied thus far do not display any noticeable trend, so the model with a constant mean 
seems reasonable, although tests based on the model with linear trend would be valid and would 
guard against any unrecognized linear trend. These tests are seen to provide even less evidence 
against the unit root. In summary, then, getting a test with validity and good statistical power requires 
appropriate decisions about the model, in terms of lags and trends. This is no suiprise, as any 
statistical hypothesis test requires a realistic model for the data. 
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The data analyzed here were used in the first edition of this book. Since then, more data on this series 
have been collected. The full set of data make it clear that the series is not stationary, in agreement 
with the tau statistic. In Output 3.18, the original series of 50 is plotted along with forecasts and 
confidence bands from an AR(2) that assumes stationarity in levels (solid lines), and an AR(1) fit to 
the differenced data (dashed lines). The more recent data are appended to the original 50. It is seen 
that for a few months into the forecast the series stays within the solid line bands, and it appears that 
the analyst who chooses stationarity is the better forecaster. He also has much tighter forecast bands. 
However, a little further ahead, the observations burst through his bands, never to return. The unit 
root forecast, though its bands may seem unpleasantly wide, does seem to give a more realistic 
assessment of the uncertainty inherent in this series. 


Output 3.18 

Silver Series, 
Stationary and 
Nonstationary 
Models 


SILVER SERIES: FORECASTS FROM 4 1/2 YEARS 

PLUS ACTUAL FUTURE VALUES 

SILVER 



DATE 


To illustrate the effects of trends, Output 3.19 shows the logarithm of the closing price of 
Amazon.com stock. The data were downloaded from the stock reports available through the Web 
search engine Yahoo! The closing prices are fairly tightly clustered around a linear trend as displayed 
in the top part of the figure. The ACF, IACF, and PACF of the series are displayed just below the 
series plot and those of the differenced series just below that. Notice that the ACF of the original 
series dies off very slowly. This could be due to a deterministic trend, a unit root, or both. The three 
plots along the bottom seem to indicate that differencing has reduced the series to stationarity. 
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In contrast, Output 3.20 shows the volume of the same Amazon.com stocks. These too show a trend, 
but notice the IACF of the differenced series. If a series has a unit root on the moving average side, 
the IACF will die off slowly. This is in line with what you've learned about unit roots on the 
autoregressive side. For the model Y t =e t - pe t x , the dual model obtained by switching the backshift 
operator to the AR side is (1 - pB)Y r =e t , so that if p is (near) 1 you expect the IACF to behave like 
the ACF of a (near) unit root process—that is, to die off slowly. 


Output 3.20 

Amazon Volume 


VOLUME 


volume 
18 

17 

16 

15 

14 

13 

12 

11 

10 
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This behavior is expected anytime Y t is the difference of an originally stationary series. Chang and 
Dickey (1993) give a detailed proof of what happens to the IACF when such overdifferencing occurs. 
They find that an essentially linear descent in the IACF is consistent with overdifferencing. This can 
follow an initial drop-off. as appears to happen in the volume data. Notice that a linear trend is 
reduced to a constant by first differencing so such a trend will not affect the behavior of the IACF of 
the differenced series. Of course a linear trend in the data will make the ACF of the levels appear to 
die off very slowly, as is also apparent in the volume data. The apparent mixed message-differencing 
indicated by the levels’ ACF and too much differencing indicated by the differences’ IACF is not 
really so inconsistent. You just need to think a little outside the class of ARIMA models to models 
with time trends and ARIMA errors. 

Regression of differences on 1, t, a lagged level, and lagged differences indicated that no lagged 
differences were needed for the log transformed closing price series and two were needed for 
volume. Using the indicated models, the parameter estimates from PROC REG using the differenced 
series as a response, DATE as the time variable, LAGC and LAGV as the lag levels of closing 
price and volume, respectively, and lagged differences DV1 and DV2 for volume are shown in 
Output 3.21. 


Output 3.21 

Closing Price 
and Volume — 
Unit Root Test 


Parameter Estimates 


Variable 

DF 

Parameter 

Estimate 

Standard 

Error 

t Value 

Pr > |t| 

Type I SS 

Intercept 

1 

-2.13939 

0.87343 

-2.45 

0.0146 

0.02462 

date 

1 

0.00015950 

0.00006472 

2.46 

0.0141 

0.00052225 

LAGC 

1 

-0.02910 

0.01124 

-2.59 

0.0099 

0.02501 


Parameter Estimates 


Variable 

DF 

Parameter 

Estimate 

Standard 

Error 

t Value 

Pr > |t| 

Type I SS 

Intercept 

1 

-17.43463 

3.11590 

-5.60 

<.0001 

0.01588 

date 

1 

0.00147 

0.00025318 

5.80 

<.0001 

0.00349 

LAGV 

1 

-0.22354 

0.03499 

-6.39 

<.0001 

25.69204 

DV1 

1 

-0.13996 

0.04625 

-3.03 

0.0026 

1 .04315 

DV2 

1 

-0.16621 

0.04377 

-3.80 

0.0002 

4.16502 
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As before, these tests can be automated using the IDENTIFY statement in PROC ARIMA. For these 
examples, clearly only the linear trend tests are to be considered. Although power is gained by using 
a lower-order polynomial when it is consistent with the data, the assumption that the trend is simply a 
constant is clearly inappropriate here. 

The tau statistics (see Fuller 1996) are -2.59 (FAGC) for closing price and -6.39 (FAGV) for 
volume. Using the large n critical values -3.13 at significance level 0.10, -3.41 at 0.05, and -3.96 
at 0.01, it is seen that unit roots are rejected even at the 0.01 level for volume. Thus the volume series 
displays stationary fluctuations around a linear trend. There is not evidence for stationarity in closing 
prices even at the 0.10 level, so even though the series seems to hug the linear trend line pretty 
closely, the deviations cannot be distinguished from a unit root process whose variance grows 
without bound. 

An investment strategy based on an assumption of reversion of log transformed closing prices to the 
linear trend line does not seem to be supported here. That is not to refute the undeniable upward trend 
in the data—it comes out in the intercept or “drift” term (estimate 0.0068318) of the model for the 
differenced series. The model (computations not shown) is 

VY, = 0.0068318+ <?, + 0.04547*?,_! 

The differences, VY,, have this positive drift term as their average, so it implies a positive change on 
average with each passing unit of time. A daily increase of 0.0068318 in the logarithm implies a 
multiplicative g" 111168318 = 1.00686 or 0.68% daily increase, which compounds to a 
^260(0 0068318) _ £ \ .78 _ g _f () | c [ j ncre ase over the roughly 260 trading days in a year. This was a period of 
phenomenal growth for many such technology stocks, with this data going from about 3.5 to about 
120 over two years’ time, roughly the predicted 36-fold increase. 

The top panel of Output 3.22 shows closing price forecasts and intervals for the unit root with drift 
model (forecast rising almost linearly from the last observation and outermost bands) and for a model 
with stationary residuals from a linear trend (forecast converging to trend line and interior bands) for 
the log scale data. The plot below, in which each of these has been transformed back to the original 
scale by exponentiation, deserves some comments. First, note the strong effect of the logarithmic 
transformation. Any attempt to model on the original scale would have to account for the obviously 
unequal variation in the data and would require a somewhat complex trend function, whereas once 
logs are taken, a rather simple model, random walk with drift, seems to suffice. There is a fairly long 
string of values starting around January 1999 that are pretty far above the trend curve. Recall that 
this trend curve is simply an exponentiation of the linear trend on the log scale and hence 
approximates a median, not a mean. This 50% probability number, the median, may be a more easily 
understood number for an investment strategist than the mean in a highly skewed distribution such as 
this. Also note that the chosen model, random walk with drift, does not even use this curve, so a 
forecast beginning on February 1, 1999, for example, would emanate from the February 1, 1999, data 
point and follow a path approximately parallel to this trend line. The residuals from this trend line 
would not represent forecasting errors from either model. Even for the model that assumes stationary 
but strongly correlated errors, the forecast consists of the trend plus an adjustment based on the error 
correlation structure. 
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Output 3.22 

Amazon Closing 
Price (two Models, 
two Scales) 
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In fact the plot actually contains forecasts throughout the historic series from both models but they 
overlay the data so closely as to be hardly distinguishable from it. Note also that the combination of 
logs and differencing, while it makes the transformed series behave nicely statistically, produces very 
wide forecast intervals on the original scale. While this may disappoint the analyst, it might 
nevertheless be a reasonable assessment of uncertainty, given that 95% confidence is required and 
that this is a volatile series. 

In summary, ignorance of unit roots and deterministic trends in time series can lead to clearly 
inappropriate mean reverting forecasts, while careful modeling of unit roots and deterministic trends 
can lead to quite reasonable and informative forecasts. Note that p-values produced under the 
assumption of stationarity can be quite misleading when unit roots are in fact present as shown in the 
silver and stock closing price examples. Both of these show inappropriately small p-values when the 
p-values are computed from the t rather than from the Dickey-Fuller distributions. In the regression 
of differences on trend terms, lagged level, and lagged differences, the usual (t and F) distributions 
are appropriate in large samples for inference on the lagged differences. To get tests with the proper 
behavior, carefully deciding on the number of lagged differences is important. Hall (1992) studies 
several methods and finds that overfitting lagged differences then testing to leave some out is a good 
method. This was illustrated in the silver example and was done for all examples here. Dickey, Bell, 
and Miller (1986) in their appendix show that the addition of seasonal dummy variables to a model 
does not change the large sample (limit ) behavior of the unit root tests discussed here. 

Some practitioners are under the false impression that differencing is justified anytime data appear to 
have a trend. In fact, such differencing may or may not be appropriate. This is discussed next. 


3.4.9 Differencing to Remove a Linear Trend 

Occasionally, practitioners difference data to remove a linear trend. Note that if Y ; has a linear trend 
a + p/ 

then the differenced series 
W, = Y, -Y, , 

involves only the constant. For example, suppose 
Y t = a + p t + e t 
where e t is white noise. Then 

W, = P+C -e t _ x 

which does not have a trend but, unfortunately, is a noninvertible moving average. Thus, the data 
have been overdifferenced. Now the IACF of W looks like the ACF of a time series with a unit root 
nonstationarity; that is, the IACF of W dies off very slowly. You can detect overdifferencing this 
way. 

The linear trend plus white noise model presented above is interesting. The ACF of the original data 
dies off slowly because of the trend. You respond by differencing, and then the IACF of the 
differenced series indicates that you have overdifferenced. This mixed signaling by the diagnostic 
functions simply tells you that the data do not fit an ARMA model on the original levels scale or on 
the differences scale. You can obtain the correct analysis in this particular case by regressing Y on t 
using PROC REG or PROC GLM. The situation is different if the error series e is not white noise 
but is instead a nonstationary time series whose difference 
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is stationary. In that case, a model in the differences is appropriate and has an intercept estimating |3. 
This scenario seems to hold in the publishing and printing data that produce the plot (U.S. Bureau of 
Labor 1977) shown in Output 3.23. The data are the percentages of nonproduction workers in the 
industry over several years. 


Output 3.23 

Plotting the 
Original Series 


PUBLISHING AND PRINTING NONPRODUCTION WORKERS 

1944-1977 


NONPUB 



The ACF shown in Output 3.24 is obtained by specifying the following statements: 

PROC ARIMA DATA=WORKERS; 

IDENTIFY VAR=NONPUB(1) NLAG=10; 

TITLE 'PUBLISHING AND PRINTING NONPRODUCTION WORKERS'; 
TITLE2 '1944-1977'; 

RUN; 


Because the ACF O looks like that of an MA(1) and because it is very common to fit an MA(1) term 
when a first difference is taken, you do that fitting by specifying these statements: 

PROC ARIMA DATA=WORKERS; 

IDENTIFY VAR=NONPUB(1) NOPRINT; 

ESTIMATE Q=1; 

FORECAST LEAD=10; 

RUN; 
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The output shows a good fit based on the Q statistics © the parameter estimates, and their t statistics 
©. Note that the MU (0.3033) © estimate is statistically significant and is roughly the slope in the 
plot of the data. Also, the MA coefficient is not near 1; in fact, it is a negative number. Thus, you 
have little evidence of overdifferencing. With only 33 observations, you have a lot of sampling 
variability (for example, look at the two standard error marks on the ACF). The number 0.3033 is 
sometimes called drift. 


Output 3.24 Modeling and Forecasting with the IDENTIFY , ESTIMATE , and FORECAST Statements: 
PROC ARIMA 




PUBLISHING ANt 

PRINTING 

NONPRODUCTION WORKERS 





1944 

-1977 







The ARIMA 

Procedure 





Name of Variable = 

NONPUB 




Period(s) of Differencing 




1 




Mean of Working 

Series 




0.30303 




Standard Deviation 




0.513741 




Number of Observations 




33 




Observation(s) eliminated 

by 

differencing 1 





Autocorrelations 










o 


Lag 

Covariance 

Correlation 

1 9 8 7 6 

5 

4 3 

2101234567891 

Std Error 

0 

0.263930 

1 .00000 





******************** 

0 

1 

0.082387 

0.31216 





****** 

0.174078 

2 

-0.025565 

- .09686 




* * 


0.190285 

3 

0.0079422 

0.03009 





* 

0.191774 

4 

0.034691 

0.13144 





* * * 

0.191917 

5 

0.010641 

0.04032 





* 

0.194626 

6 

-0.014492 

-.05491 




* 


0.194879 

7 

-0.028101 

- .10647 




* * 


0.195347 

8 

-0.018074 

- .06848 




* 


0.197098 

9 

0.024708 

0.09362 





* * 

0.197817 

10 

-0.0016373 

- .00620 






0.199155 
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Output 3.24 Modeling and Forecasting with the IDENTIFY, ESTIMATE, and FORECAST Statements: 
PROC ARIMA (continued) 


Inverse Autocorrelations 


Correlation 


-1 987654321 01 234567891 


-0.43944 

0.22666 

-0.08116 

-0.06102 

0.01986 

0.03297 

-0.04359 

0.14827 

-0.15649 

0.07695 


Partial Autocorrelations 


Correlation 


-1 987654321 01 234567891 


0.31216 

-0.21528 

0.15573 

0.05219 

-0.00997 

-0.03898 

-0.09630 

-0.02738 

0.12247 

-0.10058 


Autocorrelation Check for White Noise 


Chi- 

Square 


-Autocorrelations - 


0.5716 0.312 -0.097 0.030 
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Output 3.24 Modeling and Forecasting with the IDENTIFY, ESTIMATE, and FORECAST Statements: 
PROC ARIMA (continued) 


PUBLISHING AND PRINTING NONPRODUCTION WORKERS 
1944-1977 

The ARIMA Procedure 


Conditional Least Squares Estimation 


Parameter 


Estimate 

© 0.30330 

-0.46626 


Standard © Approx 

Error t Value Pr > |t| 


0.12294 

0.16148 


0.0193 

0.0070 


Constant Estimate 0.3033 

Variance Estimate 0.238422 

Std Error Estimate 0.488284 

AIC 48.27419 

SBC 51.2672 

Number of Residuals 33 

* AIC and SBC do not include log determinant. 


Correlations of Parameter 
Estimates 

Parameter MU MAI, 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- 



6 

1 .01 

5 

0.9619 

-0.033 

-0.089 

0.020 

0.119 

-0.032 

-0.036 

12 

3.80 

11 

0.9754 

-0.054 

-0.114 

0.157 

-0.093 

0.072 

-0.057 

18 

7.41 

17 

0.9776 

0.064 

0.001 

-0.175 

-0.108 

-0.027 

0.085 

24 

10.07 

23 

0.9909 

0.007 

-0.023 

0.067 

-0.003 

-0.123 

0.057 


Model for Variable NONPUB 


Estimated Mean 0.3033 

Period(s) of Differencing 1 
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Output 3.24 Modeling and Forecasting with the IDENTIFY, ESTIMATE, and FORECAST Statements: 
PROC ARIMA (continued) 


Moving Average Factors 
Factor 1: 1 + 0.46626 B**(1) 

Forecasts for variable NONPUB 


Obs 

Forecast 

Std Error 

95% Confidence 

Limits 

35 

43.5635 

0.4883 

42.6065 

44.5206 

36 

43.8668 

0.8666 

42.1683 

45.5654 

37 

44.1701 

1 .1241 

41.9669 

46.3733 

38 

44.4734 

1.3327 

41.8613 

47.0855 

39 

44.7767 

1.5129 

41.8116 

47.7419 

40 

45.0800 

1.6737 

41.7996 

48.3605 

41 

45.3833 

1.8204 

41 .8154 

48.9513 

42 

45.6866 

1.9562 

41.8526 

49.5206 

43 

45.9899 

2.0831 

41.9072 

50.0727 

44 

46.2932 

2.2027 

41.9761 

50.6104 


3.4.10 Other Identification Techniques 

In addition to the ACF, IACF, and PACF, three methods called ESACF, SCAN, and MINIC are 
available for simultaneously identifying both the autoregressive and moving average orders. These 
consist of tables with rows labeled AR 0, AR 1, etc. and columns MA 0, MA 1, etc. You look at the 
table entries to find the row and column whose labels give the correct p and q. Tsay and Tiao (1984, 
1985) develop the ESACF and SCAN methods and show they even work when the autoregressive 
operator has roots on the unit circle, in which case p + d rather than p is found. For 

(Y t - Y t 2 ) - 0.7(Y ( ! - Y ( 3 ) = e t ESACF and SCAN should give 3 as the autoregressive order. The 
key to showing their results is that standard estimation techniques give consistent estimators of the 
autoregressive operator coefficients even in the presence of unit roots. 

These methods can be understood through an ARMA(1,1) example. Suppose you have the 
ARMA(1,1) process Z t - a.Z ( , =e t - fie t ,. where Z t is the deviation from the mean at time t. The 

autocorrelations p(y) are p(0) = 1, p(l) = [(a - P)(l - aP)]/[l - a 2 + (P - a) 2 ], and p(y) = ap(y -1) 
for j > 1. 


The partial autocorrelations are motivated by the problem of finding the best linear predictor of Z ( 
based on Z ( ,. ,.Z ( k . That is, you want to find coefficients . for which 

E{(Z ( -(|) / . I Z ) ! -(|) / . 2 Z ) 2 -(|) / . / .Z ) k ) 2 } is minimized. This is sometimes referred to as “performing 

a theoretical regression” of Z t on Z t x , Z ( 2 ,....Z ( k or “projecting” Z t onto the space spanned by 
Z ( 2 , Z t 2 ,...,Z t k . It is accomplished by solving the matrix system of equations 


r i pw 

p(i) i 

vp(^-l) p(k-2) 


p(*-ir 




P (k ~ 2) 


= 

P(2) 

1 7 

V J 


vP(^)y 
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Letting n k = (|) tf for k = 1,2,.,. produces the sequence 7l k of partial autocorrelations. (See Section 
3.3.2.3.) 

At k = 1 in the ARMA(1,1) example, you note that 

4i = = p(l) = [(a - P)(l - aP)]/[l - a 2 + (P - a) 2 ], which does not in general equal a. Therefore 

Z t - (|> |Z ) _ i is not Z - aZ M and thus does not equal e t - pe ( ,. The autocorrelations of Z t — 4, Z f , 
would not drop to 0 beyond the moving average order. Increasing k beyond 1 will not solve the 
problem. Still, it is clear that there is some linear combination of Z t and Z ; ,, namely Z t - a.Z k ,, 
whose autocorrelations theoretically identify the order of the moving average part of your model. In 
general neither the % k sequence nor any (|) ; ., sequence contains the autoregressive coefficients unless 

the process is a pure autoregression. You are looking for a linear combination 

Z t - C I Z I ! - CZ, 2 -C Z, whose autocorrelation is 0 for j exceeding the moving average 

order q (1 in our example). The trick is to discover p and the CL s from the data. 


The lagged residual from the theoretical regression of Z ; on Z ; is R, ( , = Z ( , - 4 1 Z J 2 . which is a 
linear combination of Z ( , and Z, 2 . so regressing Z t on Z, , and R, t _ x produces regression 
coefficients, say C,j and C 22 , which give the same fit, or projection, as regressing Z ( on Z ; and 
Z t _ 2 . That is, C 21 Z f _ x + CAR, t x = CA,Z ( , + CA(Z ( , - (|) M Z ; 2 ) = (|)- 2 Z ( , + (|) 22 Z. 2 . Thus it must be that 
( |) 2I = C 21 + C 22 and (|) 22 = , C 22 . In matrix form 


> 21 " 

^4*22^ 


fl 


i Y 


4\i 


c 

y'- / 22y 


Noting that p(2) = ap(l), the (|) : ( coefficients satisfy 


■ 1 P(1) N 

>2> 

= P(1) 

rn 

V P(!) 1 y 

\t22 J 


[a) 


Relating this to the Cs and noting that p(l) = 4,. you have 


fl 4 jY i 1 YCY fn 


Mi ^ i 


c 

\ v 22 


— 


v a y 


or 


(c > 

^21 

4i 

( 0 

4i-hrn 

a 

c 

\^22j 

<hi(<hi -i) 

V> 1 

ii 

a-4i 

U 2 i-i J 
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You now “filter” Z using only C 21 = a; that is, you compute Z t - C 2 fZ t x , which is just Z t - a.Z ( ,, 
and this in turn is a moving average of order 1. Its lag 1 autocorrelation (it is nonzero) will appear in 
the AR 1 row and MA 0 column of the ESACF table. Let the residual from this regression be denoted 
R 21 . The next step is to regress Z t on Z t x , R : ( 2 . and R 2( ,. In this regression, the theoretical 

coefficient of Z ( , will again be a, but its estimate may differ somewhat from the one obtained 
previously. Notice the use of the lagged value of R 21 and the second lag of the first round residual 
Rj t2 = Z t 2 - <|yZ ( 3 . The lag 2 autocorrelation of Z t - a.Z ( ,. which is 0, will be written in the MA 
1 column of the AR 1 row. For the ESACF of a general ARM A ( p.cj) in the AR p row, once your 
regression has at least q lagged residuals, the first p theoretical C k] will be the p autoregressive 
coefficients and the filtered series will be a MA(g), so its autocorrelations will be 0 beyond lag q. 

The entries in the AR k row of the ESACF table are computed as follows: 

(1) Regress Z t on Z ( , ,Z ( 2 ... ,.Z ( k with residual R lf 
Coefficients: C n ,C 12 ,...,C u . 

(2) Regress Z t on Z ( , ,Z ( 2 .....Z I k . R, ( , with residual R 2( 

Second-round coefficients: C 21 ,...,C 2if (and C 2k+l ) 

Record in MA 0 column, the lag 1 autocorrelation of 
7 _r 7 _r 7 _ r 7 

I -‘t Kj 2\ t ^t-\ y ~'22^t-2 y ~'2k^t-k 

(3) Regress Z t on Z t l ,Z t 2 ,...,Z t _ k ,R x t ^,R 2t x with residual R 3r 

Third-round coefficients: C 31 ,...,C 3k , (and C 3k+1 ,C 3k+2 ) 

Record in MA 1 column the lag 2 autocorrelation of 
7 _r 7 _r 7 _ r 7 

I -‘t '“31 Zj r-l '“32 Zj r-2 ^3 k^t-k 

etc. 

Notice that at each step, you lag all residuals that were previously included as regressors and add the 
lag of the most recent residual to your regression. The estimated C coefficients and resulting filtered 
series differ at each step. Looking down the ESACF table of an AR ( p , q), theoretically row p should 
be the first row in which a string of Os appears and it should start at the MA q column. Finding that 
row and the first 0 entry in it puts you in row p column q of the ESACF. The model is now identified. 

Here is a theoretical ESACF table for an ARMA(l.l) with “X” for nonzero numbers: 



MAO 

MA 1 

MA 2 

MA 3 

MA 4 

MA 5 

ARO 

X 

X 

X 

X 

X 

X 

AR 1 

X 

0* 

0 

0 

0 

0 

AR 2 

X 

X 

0 

0 

0 

0 

AR 3 

X 

X 

X 

0 

0 

0 

AR 4 

X 

X 

X 

X 

0 

0 
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The string of Os slides to the right as the AR row number moves beyond p , so there appears a 
triangular array of Os whose “point” 0* is at the correct ( p,q ) combination. 

In practice, the theoretical regressions are replaced by least squares regressions, so the ESACF table 
will only have numbers near 0 where the theoretical ESACF table has Os. A recursive algorithm is 
used to quickly compute the needed coefficients without having to compute so many actual 
regressions. PROC ARIMA will also use asymptotically valid standard errors based on Bartlett's 
formula to deliver a table of approximate p-values for the ESACF entries and will suggest values of p 
and q as a tentative identification. See Tsay and Tiao (1984) for further details. 

Tsay and Tiao (1985) suggest a second table called SCAN. It is computed using canonical 
correlations. For the ARMA(1,1) model, recall that the autocovariances are y(0), y(l), 

y(2) = ay(l), y(3) = a 2 y(l), y(4) = a 3 y(l), etc., so the covariance matrix of Y 2 ,Y 21 ,...,Y 2 5 is 


y(0) 

Y(l) 

ay(l) 

a 2 y(l) 

a 3 y(l) 

a 4 y(l) 

Y(l) 

y(Q) 

Y(l) 

ay(l) 

a 2 y(l) 

a 3 y(l) 

[ay(l)] 

[y(1)] 

y(Q) 

Y(l) 

ay(l) 

a 2 y(l) 

[a 2 y(l)] 

[ay(l)] 

Y(l) 

y(Q) 

Y(l) 

ay(l) 

a 3 y(l) 

a 2 y(l) 

ay(l) 

Y(l) 

y(Q) 

Y(l) 

a 4 y(l) 

a 3 y(l) 

cry(l) 

ay(l) 

Y(l) 

y(Q) 


The entries in square brackets form the 2x2 submatrix of covariances between the vectors 
(Y ( , Y ( ,) and (Y ( 2 ,Y ( 3 ). That submatrix A, the variance matrix C n of (Y ( . Y ( ,). and the variance 
matrix C 22 of (Y ( : .Y ( 2i ) are 




y(0) y(l) 2 
Y(l) y(0). 


The best linear predictor of (Y t ,Y t l )' based on (Y ( 2 ,Y ( 3 )' is A'C 22 (Y ( 2 ,Y ( 3 )' with prediction 
error variance matrix C n -A'C 22 A. Because matrix C n represents the variance of (Y ( . Y ( ,). the 
matrix C,, A'C 22 A is analogous to a regression R 2 statistic. Its eigenvalues are called squared 
canonical correlations between (Y ( . Y ( ,)' and (Y ( : .Y ( 3 )'. 

Recall that, for a square matrix M, if a column vector H exists such that MH = h\\. then H is called 
an eigenvector and the scalar b is the corresponding eigenvalue of matrix M. Using H = (1,-a)', you 
see that AH = (0,0)', so C,, A'C 22 AH = OH; that is, C,, A'C 22 A has an eigenvalue 0. The number 
of 0 eigenvalues of A is the same as the number of 0 eigenvalues of C,, A'C 22 A. This is true for 
general time series covariance matrices. 
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The matrix A has first column that is a times the second, which implies these equivalent statements: 

(1) The 2x2 matrix A is not of full rank (its rank is 1). 

(2) The 2x2 matrix A has at least one eigenvalue 0. 

(3) The 2x2 matrix C,A'C 22 A has at least one eigenvalue 0. 

(4) The vectors (Y ( , Y ( ,) and (Y ( : .Y ( 3 ) have at least one squared canonical correlation that is 0. 

The fourth of these statements is easily seen. The linear combinations Y t - a.Y ( , and its second lag 
Y ( 2 - a.Y ( 3 have correlation 0 because each is an MA(1). The smallest canonical correlation is 
obtained by taking linear combinations of (Y ( , Y ( ,) and (Y ( : .Y ( 3 ) and finding the pair with 
correlation closest to 0. Since there exist linear combinations in the two sets that are uncorrelated , 
the smallest canonical correlation must be 0. Again you have a method of finding a linear 
combination whose autocorrelation sequence is 0 beyond the moving average lag q. 

In general, construct an arbitrarily large covariance matrix of Y ( . Y ( ,. Y ( 2 ..... and let A ■ m be the 
m xm matrix whose upper-left element is in row j + 1, column 1 of the original matrix. In this 
notation, the A with square bracketed elements is denoted A 2 2 and the bottom left 3x3 matrix of 
r is A 33 . Again there is a full-rank 3x2 matrix H for which A 33 H has all 0 elements, namely 



a 2 y(l) ay(l) y(l) 

' \ 0 > 


4 0 0 4 

a 33 h = 

a 3 y(l) a 2 y(l) ay(l) 

-a 1 

= 

0 0 


^a 4 y(l) a 3 y(l) a 2 y(l) y 

0 -a j 


lo °J 


showing that matrix A, 3 has (at least) 2 eigenvalues that are 0 with the columns of H being the 
corresponding eigenvectors. Similarly, using A, 2 and H = (1,-a) 


A 32 H 


a 2 y(l) 

v a 3 y(l) 


ay(l) 

a 2 y(l) 


Y l ^ 

/ V a y 


ro^ 
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so A 3 2 has (at least) one 0 eigenvalue, as does A ; 2 for all j > 1. In fact all A ; m with j > 1 and 
m > 1 have at least one 0 eigenvalue for this example. For general ARIMA ( p.q) models, all A ; . m 
with j > q and m> p have at least one 0 eigenvalue. This provides the key to the SCAN table. If 
you make a table whose m\h row, /th column entry is the smallest canonical correlation derived 
from A ; m . you have this table for the current example: 



m = 1 

m = 2 

m = 3 

m = 4 

7 = 1 

X 

X 

X 

X 

7 = 2 

X 

0 

0 

0 

7 = 3 

X 

0 

0 

0 

7 = 4 

X 

0 

0 

0 



p = 0 

p = 1 

p = 2 

P = 3 

o 

II 

X 

X 

X 

X 

q = i 

X 

0 

0 

0 

<N 

II 

X 

0 

0 

0 

q = 3 

X 

0 

0 

0 


where the Xs represent nonzero numbers. Relabeling the rows and columns with q = j -l and 
p = m -1 gives the SCAN (smallest canonical correlation) table. It has a rectangular array of Os 
whose upper-left corner is at the p and q corresponding to the correct model, ARMA( 1,1) for the 
current example. The first column of the SCAN table consists of the autocorrelations and the first 
row consists of the partial autocorrelations. 

In PROC ARIMA, entries of the 6x6 variance-covariance matrix Y above would be replaced by 
estimated autocovariances. To see why the Os appear for an ARM A ( p.q) whose autoregressive 

coefficients are a ; , you notice from the Yule-Walker equations that 

y(j) - CL x y{j -1) - a 2 y {j -2) - a p j{j ~ P ) is zero for j > <1- Therefore, in the variance 

covariance matrix for such a process, any m xm submatrix with m> p whose upper-left element is 
at row j, column 1 of the original matrix will have at least one 0 eigenvalue with eigenvector 
(l,-a 1 ,-a 2 ,...,-a i ,,0,0,...,0)' if j > q. Hence 0 will appear in the theoretical table whenever 
m> p and j > q . Approximate standard errors are obtained by applying Bartlett's formula to the 

series filtered by the autoregressive coefficients, which in turn can be extracted from the H matrix 
(eigenvectors). An asymptotically valid test, again making use of Bartlett's formula, is available and 
PROC ARIMA displays a table of the resulting p-values. 
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The M1N1C method simply attempts to fit models over a grid of p and q choices, and records the 
SBC information criterion for each fit in a table. The Schwartz Bayesian Information Criterion is 
SBC = «ln(s 2 ) + (p + q)\n(n). where p and q are the autoregressive and moving average orders of 

the candidate model and s 2 is an estimate of the innovations variance. Some sources refer to 
Schwartz’s criterion, perhaps normalized by n, as BIC. Here, the symbol SBC is used so that 
Schwartz’s criterion will not be confused with the BIC criterion of Sawa (1978). Sawa’s BIC, used as 

a model selection tool in PROC REG, is «ln(s 2 ) + 2[(k + 2) J' k - (dk ) 2 J for a full regression model 

with n observations and k parameters. The MINIC technique chooses p and q giving the smallest 
SBC. It is possible, of course, that the fitting will fail due to singularities in which case the SBC is set 
to missing. 

The fitting of models in computing MINIC follows a clever algorithm suggested by Hannan and 
Rissanen (1982) using ideas dating back to Durbin (1960). First, using the Yule-Walker equations, a 
long autoregressive model is fit to the data. For the ARMA(1,1) example of this section it is seen that 

Y, = (a-P)LY ( , + PY, 2 + p 2 Y,_ 3 + ---J + C, 


and as long as |p| < 1, the coefficients on lagged Y will die off quite quickly, indicating that a 

truncated version of this infinite autoregression will approximate the e t process well. To the extent 
that this is true, the Yule-Walker equations for a length k (k large) autoregression can be solved to 
give estimates, say b , of the coefficients of the Y t ; terms and a residual series 

e t =Y t -b ] Y ! ! - b 2 Y t 2 - b k Y t k that is close to the actual e t series. Next, for a candidate model 

of order p,q, regress Y ( on Y t l ,...,Y t _ p ,e t l ,e t _ 2 ,...,e t _ q . Fetting a 2 be 1 In times the error sum of 
squares for this regression, pick p and q to minimize the SBC criterion 

SBC = «ln(a 2 ) + ( p + q)\n(n). The length of the autoregressive model for the c ( series can be 
selected by minimizing the AIC criterion. 

To illustrate, 1000 observations on an ARMA(1,1) with a = .8 and p = .4 are generated and 
analyzed. The following code generates Output 3.25: 

PROC ARIMA DATA=A; 

I VAR=Y NLAG=1 MINIC P=(0:5) Q=(0:5); 

I VAR=Y NLAG=1 ESACF P=(0:5) Q=(0:5); 

I VAR=Y NLAG=1 SCAN P=(0:5) Q=(0:5); 

RUN; 
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Output 3.25 ESACF, SCAN and MINIC Displays 


Lags 

MA 0 

Minimum 

MA 1 

Information 

MA 2 

Criterion 

MA 3 

MA 4 

MA 5 

AR 0 

0.28456 

0.177502 

0.117561 

0.059353 

0.028157 

0.003877 

AR 1 

-0.0088 

-0.04753 

-0.04502 

-0.0403 

-0.03565 

-0.03028 

AR 2 

-0.03958 

-0.04404 

-0.04121 

-0.0352 

-0.03027 

-0.02428 

AR 3 

-0.04837 

-0.04168 

-0.03537 

-0.02854 

-0.02366 

-0.01792 

AR 4 

-0.04386 

-0.03696 

-0.03047 

-0.02372 

-0.01711 

-0.01153 

AR 5 

-0.03833 

-0.03145 

-0.02461 

-0.0177 

-0.01176 

-0.00497 

Lags 

MA 0 

Enron series model: AR(9) 

Minimum Table Value: BIC(3,0) = -0.04837 

Extended Sample Autocorrelation Function 

MA 1 MA 2 MA 3 MA 4 

MA 5 

AR 0 

0.5055 

0.3944 

0.3407 

0.2575 

0.2184 

0.1567 

AR 1 

-0.3326 

-0.0514 

0.0564 

-0.0360 

0.0417 

-0.0242 

AR 2 

-0.4574 

-0.2993 

0.0197 

0.0184 

0.0186 

-0.0217 

AR 3 

-0.1207 

-0.2357 

0.1902 

0.0020 

0.0116 

0.0006 

AR 4 

-0.4074 

-0.1753 

0.1942 

-0.0132 

0.0119 

0.0015 

AR 5 

0.4836 

0.1777 

-0.0733 

0.0336 

0.0388 

-0.0051 

Lags 

MA 0 

ESACF 

MA 1 

Probability 

MA 2 

Values 

MA 3 

MA 4 

MA 5 

AR 0 

0.0001 

0.0001 

0.0001 

0.0001 

0.0001 

0.0010 

AR 1 

0.0001 

0.1489 

0.1045 

0.3129 

0.2263 

0.4951 

AR 2 

0.0001 

0.0001 

0.5640 

0.6013 

0.5793 

0.6003 

AR 3 

0.0001 

0.0001 

0.0001 

0.9598 

0.7634 

0.9874 

AR 4 

0.0001 

0.0001 

0.0001 

0.7445 

0.7580 

0.9692 

AR 5 

0.0001 

0.0001 

0.0831 

0.3789 

0.2880 

0.8851 



ARMA(p+d,q) Tentative Order 

(5% Significance 

ESACF p+d q 

1 1 

4 3 

5 2 

Selection 

Level) 

Tests 
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Output 3.25 ESACF, SCAN and MINIC Display (continued) 




Squared Canonical Correlation Estimates 



Lags 

MA 0 

MA 1 

MA 2 

MA 3 

MA 4 

MA 5 

AR 0 

0.2567 

0.1563 

0.1170 

0.0670 

0.0483 

0.0249 

AR 1 

0.0347 

0.0018 

0.0021 

0.0008 

0.0011 

0.0003 

AR 2 

0.0140 

0.0023 

0.0002 

0.0002 

0.0002 

0.0010 

AR 3 

0.0002 

0.0007 

0.0002 

0.0001 

0.0002 

0.0001 

AR 4 

0.0008 

0.0010 

0.0002 

0.0002 

0.0002 

0.0002 

AR 5 

0.0005 

0.0001 

0.0002 

0.0001 

0.0002 

0.0004 





SCAN Chi-Square[1] Probability Values 



Lags 

MA 0 

MA 1 MA 2 

MA 3 

MA 4 

MA 5 

AR 

0 

0.0001 

0.0001 0.0001 

0.0001 

0.0001 

0.0010 

AR 

1 

0.0001 

0.2263 0.1945 

0.4097 

0.3513 

0.5935 

AR 

2 

0.0002 

0.1849 0.7141 

0.6767 

0.7220 

0.3455 

AR 

3 

0.6467 

0.4280 0.6670 

0.9731 

0.6766 

0.9877 

AR 

4 

0.3741 

0.3922 0.6795 

0.6631 

0.7331 

0.7080 

AR 

5 

0.4933 

0.8558 0.7413 

0.9111 

0.6878 

0.6004 




ARMA(p+d,q) Tentative Order 

Selection Tests 





(5% Significance 

Level) 




SCAN p+d q 
1 1 

3 0 


The tentative order selections in ESACF and SCAN simply look at all triangles (rectangles) for 
which every element is insignificant at the specified level (0.05 by default). These are listed in 
descending order of size (below the tables), size being the number of elements in the triangle or 
rectangle. In our example ESACF (previous page) and SCAN (above) list the correct (1,1) order at 
the top of the list. The MINIC criterion uses k = 9, a preliminary AR(9) model, to create the 
estimated white noise series, then selects (p,q) = (3,0) as the order, this also being one choice given 
by the SCAN option. The second smallest SBC, -.04753, occurs at the correct ( p,q ) = (1,1). 

As a check on the relative merits of these methods, 50 ARMA(1,1) series each of length 500 are 
generated for each of the 12 (a,P) pairs obtained by choosing a and p from {-.9,-.3, .3, .9} such 
that a A p. This gives 600 series. For each, the ESACF, SCAN, and MINIC methods are used, the 
results are saved, and the estimated p and q are extracted for each method. The whole experiment is 
repeated with series of length 50. A final set of 600 runs for Y t = ,5Y ( 4 +e t + 3e l , using n = 50 
gives the last three columns. Asterisks indicate the correct model. 






Chapter 3: The General ARIMA Model 137 


1 

<- ARMA(1 

,1) 

n=500 -> 

| |<- ARMA(1,1) 

A 

O 

LO 

II 

c 

l<- 

ARMA(4,1) 

A 

O 

LO 

II 

c 

pq 

BIC ESACF 

SCAN 

BIC 

ESACF 

SCAN 

BIC 

ESACF 

SCAN 

00 

2 

1 

1 

25 

40 

25 

69 

64 

35 

01 

0 

0 

0 

48 

146 

126 

28 

46 

33 

02 

0 

0 

0 

17 

21 

8 

5 

9 

11 

03 

0 

0 

0 

7 

4 

16 

4 

6 

2 

04 

0 

0 

0 

7 

3 

2 

41 

20 

35 

05 

0 

0 

0 

6 

1 

0 

14 

0 

2 

10 

1 

0 

0 

112 

101 

145 

28 

15 

38 

11 * 

252 *** 

441 

*** 461 

* * * 53 

* * 165 * * 

203 * 

5 

47 

78 

12 

13 

23 

8 

16 

7 

9 

1 

10 

30 

13 

13 

18 

8 

12 

0 

2 

0 

1 

3 

14 

17 

5 

3 

2 

0 

1 

3 

0 

0 

15 

53 

6 

0 

5 

0 

0 

4 

0 

0 

20 

95 

6 

12 

91 

41 

18 

26 

16 

19 

21 

9 

6 

25 

9 

22 

14 

2 

42 

25 

22 

24 

46 

32 

4 

8 

7 

3 

62 

121 

23 

1 

0 

1 

1 

2 

1 

1 

2 

8 

24 

4 

0 

2 

3 

0 

1 

2 

1 

0 

25 

10 

0 

1 

6 

0 

0 

0 

0 

0 

30 

35 

2 

9 

50 

6 

8 

30 

9 

21 

31 

5 

3 

11 

1 

10 

3 

3 

23 

27 

32 

3 

6 

1 

3 

4 

0 

3 

21 

7 

33 

3 

15 

13 

0 

2 

0 

1 

16 

2 

34 

5 

2 

0 

1 

0 

0 

0 

0 

0 

35 

4 

0 

0 

2 

0 

0 

0 

0 

0 

40 

5 

0 

0 

61 

6 

6 

170 

66 

98 

41 

3 

0 

5 

3 

4 

0 

* 10 

* * * 52 

* * * Q 

42 

2 

4 

2 

1 

2 

0 

4 

24 

0 

43 

5 

3 

0 

0 

0 

0 

0 

22 

0 

44 

1 

4 

1 

1 

0 

0 

0 

0 

0 

45 

6 

0 

0 

5 

0 

0 

1 

0 

0 

50 

5 

0 

0 

32 

3 

2 

116 

6 

5 

51 

0 

1 

2 

10 

1 

0 

18 

13 

0 

52 

3 

0 

1 

2 

0 

0 

2 

6 

0 

53 

9 

2 

0 

2 

0 

0 

5 

0 

0 

54 

6 

1 

0 

2 

1 

0 

1 

0 

0 

55 

6 

0 

0 

0 

0 

0 

0 

0 

0 

totals 

600 

595 

599 

600 

600 

597 

600 

599 

600 


It is reassuring that the methods almost never underestimate p or q when n is 500. For the 
ARMA(1,1) with parameters in this range, it appears that SCAN does slightly better than ESACF, 
with both being superior to M1NIC. The SCAN and ESACF columns do not always add to 600 
because, for some cases, no rectangle or triangle can be found with all elements insignificant. 
Because SCAN compares the smallest normalized squared canonical correlation to a distribution 
) that is appropriate for a randomly selected one, it is also very conservative. By analogy, even if 
5% of men exceed 6 feet in height, finding a random sample of 10 men whose shortest member 
exceeds 6 feet in height would be extremely rare. Thus the appearance of a significant bottom-right- 
corner element in the SCAN table, which would imply no rectangle of insignificant values, happens 
rarely—not the 30 times you would expect from 600(.05) = 30. 
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The conservatism of the test also implies that for moderately large p and q there is a fairly good 
chance that a rectangle (triangle) of “insignificant” terms will appear by chance having p or q too 
small. Indeed for 600 replicates of the model Y t = ,5Y ( 4 +e t + 3e t l using n = 50, we see that 
( p,q) = (4,1) is rarely chosen by any technique with SCAN giving no correct choices. There does 
not seem to be a universally preferable choice among the three. 

As a real data example, Output 3.26 shows monthly interbank loans in billions of dollars. The data 
were downloaded from the Federal Reserve Web site. Also shown are the differences (upper-right 
corner) and the corresponding log scale graphs. The data require differencing and the right-side 
graphs seem to indicate the need for logarithms to stabilize the variance. 

Output 3.26 _ 

Loans INTERBANK LOANS 

ORIGINAL (TOP) LOGS (BOTTOM) 

LEVELS (LEFT) DIFFERENCES (RIGHT) 



JAN70 JAN80 JAN90 JANOO JAN10 JAN70 JAN80 JAN90 JANOO JAN10 
date date 



JAN70 JAN80 JANOO JANOO JAN10 JAN70 JAN80 JAN90 JANOO JAN10 


date 


date 
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To identify the log transformed variable, called LOANS in the data set, use this code to get the 
SCAN table. 


PROC ARIMA DATA=IBL; 

IDENTIFY VAR=LOANS SCAN P= (0:5) Q=(0:5); 
RUN; 


Output 3.27 shows the SCAN results. They indicate several possible models. 


Output 3.27 

SCAN Table for 
In terbank Loans 


Squared Canonical Correlation Estimates 

Lags MAO MAI MA 2 MA 3 MA 4 MA 5 

AR 0 0.9976 0.9952 0.9931 0.9899 0.9868 0.9835 

AR 1 <.0001 0.0037 0.0397 0.0007 0.0024 0.0308 

AR 2 0.0037 0.0003 0.0317 0.0020 <.0001 0.0133 

AR 3 0.0407 0.0309 0.0274 0.0126 0.0134 0.0125 

AR 4 0.0004 0.0053 0.0076 0.0004 0.0022 0.0084 

AR 5 0.0058 0.0003 0.0067 0.0022 <.0001 0.0078 

SCAN Chi-Square[1] Probability Values 

Lags MAO MAI MA 2 MA 3 MA 4 MA 5 


AR 0 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 

AR 1 0.9125 0.2653 0.0003 0.6474 0.3936 0.0019 

AR 2 0.2618 0.7467 0.0033 0.4419 0.9227 0.0940 

AR 3 0.0002 0.0043 0.0136 0.0856 0.0881 0.1302 

AR 4 0.7231 0.1942 0.1562 0.7588 0.4753 0.1589 

AR 5 0.1613 0.7678 0.1901 0.4709 0.9708 0.1836 

The ARIMA Procedure 

ARMA(p+d,q) 

Tentative 

Order 

Selection 

Tests 

-SCAN--- 

p+d q 

4 0 

2 3 


(5% Significance Level) 
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The SCAN table was computed on log transformed, undifferenced data. Therefore, the listed number 
p + d represents p +1 and SCAN suggests ARIMA(3,1,0) or ARIMA( 1,1,3). 

IDENTIFY VAR=LOANS(1) NOPRINT; 

ESTIMATE P=3 ML; 

ESTIMATE P=1 Q=3 ML; 

RUN; 

The chi-square checks for both of these models are insignificant at all lags, indicating both models fit 
well. Both models have some insignificant parameters and could be refined by omitting some lags if 
desired (output not shown). 


3.5 Summary 

The steps for analyzing nonseasonal univariate series are outlined below. 

1. Check for nonstationarity using 

□ data plot to monitor slow level shifts in the data (as in IBM example) 

□ ACF to monitor very slow decay (IBM or publishing and printing example) 

□ Dickey and Fuller test for stationarity (silver example). 

If any of these tests indicate nonstationarity, difference the series using VAR=Y(1) in the 
IDENTIFY statement and repeat step 1. If necessary, difference again by specifying 
VAR=Y( 1.1). ' .~ 

2. Check the Q statistic (CHI SQUARE) at the bottom of the printout. If Q is small (in other 
words, PROB is fairly large) and if the first few autocorrelations are small, you may want to 
assume that your (possibly differenced) series is just white noise. 

3. Check the ACF, IACF, and PACF to identify a model. If the ACF drops to 0 after q lags, this 
indicates an MA(g) model. If the IACF or PACF drops to 0 after p lags, this indicates an 
AR(p) model. If you have differenced the series once or twice, one or two MA lags are likely 
to be indicated. 

4. You can use the SCAN, ESACF, and MINIC tables to determine initial stalling models to try 
in an ESTIMATE statement. 

5. Using the ESTIMATE statement, specify the model you picked (or several candidate models). 
For example, you fit the model 

(Y;-Y m )=(1-0 1 BK 

by specifying these statements: 

PROC ARIMA DATA=SASDS; 

IDENTIFY VAR=Y(1); 

ESTIMATE Q=1 NOCONSTANT; 

RUN; 
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6. Check the Q statistic (CHI SQUARE) at the bottom of the ESTIMATE printout. If it is 
insignificant, your model fits reasonably well according to this criterion. Otherwise, return to 
the original ACF, IACF, and PACF of your (possibly differenced) data to determine if you 
have missed something. This is generally more advisable than plotting the ACF of the 
residuals from this misspecified model. 

If you have differenced, the mean is often (IBM data), but not always (publishing and printing 
data), 0. Use the NOCONSTANT option to suppress the fitting of a constant. 

Fitting extra lags and excluding insignificant lags in an attempt to bypass identification causes 
unstable parameter estimates and possible convergence problems if you overfit on both sides 
(AR and MA) at once. Correlations of parameter estimates are extremely high in this case (if, 
in fact, the estimation algorithm converges). Overfitting on one side at a time to check the 
model is no problem. 

7. Use the FORECAST statement with LEAD=k to produce forecasts from the fitted model. It is 
a good idea to specify BACK=Z? to start the forecast b steps before the end of the series. You 
can then compare the last b forecasts to data values at the end of the series. If you note a large 
discrepancy, you may want to adjust your forecasts. You omit the BACK= option on your final 
forecast. It is used only as a diagnostic tool. 

8. Examine plots of residuals and possibly use PROC UNIVARIATE to examine the distribution 
and PROC SPECTRA to test the white noise assumption further. (See Chapter 7, “Spectral 
Analysis,” for more information.) 



142 



chapter4 The ARIMA Model: Introductory 
Applications 


4.1 Seasonal Time Series 143 

4.1.1 Introduction to Seasonal Modeling 143 

4.1.2 Model Identification 145 

4.2 Models with Explanatory Variables 164 

4.2.1 Case 1: Regression with Time Series Errors 164 

4.2.2 Case 1A: Intervention 165 

4.2.3 Case 2: Simple Transfer Function 165 

4.2.4 Case 3: General Transfer Function 166 

4.2.5 Case 3A: Leading Indicators 166 

4.2.6 Case 3B: Intervention 167 

4.3 Methodology and Example 167 

4.3.1 Case 1: Regression with Time Series Errors 167 

4.3.2 Case 2: Simple Transfer Functions 179 

4.3.3 Case 3: General Transfer Functions 183 

4.3.4 Case 3B: Intervention 213 

4.4 Further Examples 223 

4.4.1 North Carolina Retail Sales 223 

4.4.2 Construction Series Revisited 231 

4.4.3 Milk Scare (Intervention) 233 

4.4.4 Terrorist Attack 237 


4.1 Seasonal Time Series 


4.1.1 Introduction to Seasonal Modeling 

The first priority in seasonal modeling is to specify correct differencing and appropriate 
transformations. This topic is discussed first, followed by model identification. The potential 
behavior of autocorrelation functions (ACFs) for seasonal models is not easy to characterize, but 
ACFs are given for a few seasonal models. You should find a pattern that matches your data among 
these diagnostic plots. 

Consider the model 

Y, - p = a( Y M2 -p) + <?, 

where e t is white noise. This model is applied to monthly data and expresses this December’s Y, for 
example, as p plus a proportion of last December’s deviation from p. If p = 100, a = .8, and last 
December’s Y=120, the model forecasts this December’s Y as 100+.8(20)=116. The forecast for the 
next December’s Y is 100+.64(20), and the forecast for j Decembers ahead is 1()()+. 8' (20). 
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The model responds to change in the series because it uses only the most recent December to forecast 
the future. This approach contrasts with the indicator variables in the regression approach discussed 
in Chapter 1, “Overview of Time Series,” where the average of all December values goes into the 
forecast for this December. For the autoregressive (AR) seasonal model above, the further into the 
future you forecast, the closer your forecast is to the mean p. Suppose you allow a to be 1 in the 
AR seasonal model. Your model is nonstationary and reduces to 

Y t = %-12 + e t 

This model uses last December’s Y as the forecast for next December (and for any other future 
December). The difference 

y,-y,_ 12 

is stationary (white noise, in this case) and is specified using the PROC ARIMA statement 
IDENTIFY VAR=Y(12); 


This is called a span 12 difference. The forecast does not tend to return to the historical series mean, 
as evidenced by the lack of a p term in the model. 

When you encounter a span 12 difference, often the differenced series is not white noise but is 
instead a moving average of the form 

- IH-12 

For example, if 

Y - Y = e - 5e 

L t -12 •^ e /-12 


you see that 


e = Y - Y + 5e 

C-12 L t -12 r-24 ' '^C-24 


Y,_i2 Y,_24 + .5^Y /24 Y /36 + ,5e /36 ^ 


If you continue in this fashion, you can express Y as e plus an infinite weighted sum of past Y 
values, namely 

Y,=e,+.5(Y,_ 12 +.5Y,_ M +...) 

Thus, the forecast for any future December is a weighted sum of past December values, with weights 
decreasing exponentially as you move further into the past. Although the forecast involves many past 
Decembers, the decreasing weights make it respond to recent changes. 

Differencing over seasonal spans is indicated when the ACF at the seasonal lags dies off very slowly. 
Often this behavior is masked in the original ACF, which dies off slowly at all lags. In that case you 
should difference, as the ACF seems to indicate, by specifying the PROC ARIMA statement 

IDENTIFY VAR=Y(1); 


Now look at the ACF of the differenced series, considering only seasonal lags (12, 24, 36, and so on). 
If these ACF values die off very slowly, you want to take a span 12 difference in addition to the first 
difference. You accomplish this by specifying 


IDENTIFY VAR=Y(1,12); 
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Note how the differencing specification works. For example, 

IDENTIFY VAR=Y(1,1); 

specifies a second difference 

(Y, - Y,J - (Y,_! - Y,_ 2 ) = Y t - 2Y,_, + Y,_ 2 
whereas the specification 

IDENTIFY VAR=Y(2); 

creates the span 2 difference 

Calling the span 1 and span 12 differenced series V, you create 
V^fr-Y^MY^-Y,^) 
and consider models for V. 

4.1.2 Model Identification 

If V appears to be white noise, the model becomes 

Y,=Y,_ 1+ (Y,_ 12 -Y,_ 13 ) + e, 

Thus, with data through this November, you forecast this December’s Y as the November value 
(Y ( _ ; ) plus last year’s November-to-December change (Y ( _ i2 - Y r /J ). 

More commonly, you find that the differenced series V satisfies 
Y> = (l-0 1 B)(l-0 2 B 12 )e r 

This is called a seasonal multiplicative moving average. The meaning of a product of backshift 
factors like this is simply 

V — 6 — 0 6 — 0 c — Sc 

y t u l^-l u 2^-12 uc t -13 

where 5 = -0j0 2 . If you are not sure about the multiplicative structure, you can specify 
ESTIMATE Q=(1,12,13); 

and check to see if the third estimated moving average (MA) coefficient 8 is approximately the 
negative of the product of the other two (0j0,). To specify the multiplicative structure, issue the 
PROC ARIMA statement 


ESTIMATE Q=(1)(12); 
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After differencing, the intercept is probably 0, so you can use the NOCONSTANT option. You can 
fit seasonal multiplicative factors on the AR side also. For example, specifying 

ESTIMATE P=(1,2)(12) NOCONSTANT; 

causes the model 

(l - otjB - o,tB 2 )(i - a 3 B 12 )v) = e t 

to be fit to the data. 

Consider the monthly number of U.S. masonry and electrical construction workers in thousands (U.S. 
Bureau of Census 1982). You issue the following SAS statements to plot the data and compute the 
ACF for the original series, first differenced series, and first and seasonally differenced series: 

PROC GPLOT DATA= CONST; 

PLOT CONSTRCT*DATE/HMINOR=0 VMINOR=0; 

TITLE 'CONSTRUCTION REVIEW'; 

TITLE2 'CONSTRUCTION WORKERS IN THOUSANDS'; 

SYMBOL1 L=1 1=JOIN C=BLACK V=NONE; 

RUN; 

PROC ARIMA DATA= CONST; 

IDENTIFY VAR=CONSTRCT NLAG=36; 

IDENTIFY VAR=CONSTRCT(1) NLAG=36; 

IDENTIFY VAR=CONSTRCT(1,12) NLAG=36; 

RUN; 


The plot is shown in Output 4.1. The ACFs are shown in Output 4.2. The plot of the data displays 
nonstationary behavior (nonconstant mean). The original ACF O shows slow decay, indicating a first 
differencing. The ACF of the first differenced series © shows slow decay at the seasonal lags, 
indicating a span 12 difference. The Q statistics © on the CONSTRCT(l,12) differenced variable 
indicate that no AR or MA terms are needed. 


Output 4.1 

Plotting the 
Original Data 


CONSTRUCTION REVIEW 

CONSTRUCTION WORKERS IN THOUSANDS 

CONSTRCT 



DATE 
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Output 4.2 Computing the ACF with the IDENTIFY Statement: PROC ARIMA 





CONSTRUCTION REVIEW 




CONSTRUCTION WORKERS IN THOUSANDS 





The ARIMA Procedure 




Name of Variable = CONSTRCT 




Mean 

of Working Series 585.4149 




Standard 

Deviation 50.65318 




Number of 

Observations 67 






Autocorrelations 






O 


Lag 

Covariance 

Correlation 

-1 

987654321 01 234567891 

Std Error 

0 

2565.745 

1.00000 

1 

I ******************** 1 

0 

1 

2213.592 

0.86275 

1 

1***************** 1 

0.122169 

2 

1746.293 

0.68062 

1 

1************** 1 

0.192729 

3 

1296.936 

0.50548 

1 

1********** 1 

0.225771 

4 

907.000 

0.35350 

1 

I ****** * 1 

■ 1 ■ 1 

0.242074 

5 

598.110 

0.23311 

1 

1 ** * ** 1 

■ 1 ■ 1 

0.249660 

6 

443.389 

0.17281 

1 

1 ** * 1 

■ 1 ■ 1 

0.252887 

7 

418.306 

0.16304 

1 

1 ** * 1 

■ 1 ■ 1 

0.254644 

8 

541.150 

0.21091 

1 

1 ** * * 1 

■ 1 ■ 1 

0.256197 

9 

712.443 

0.27767 

1 

1 ***** * 1 

■ ! ■ 1 

0.258776 

10 

922.830 

0.35967 

1 

| ****** * | 

0.263185 

11 

1126.964 

0.43923 

1 

j********* | 

0.270422 

12 

1201.001 

0.46809 

1 

j********* | 

0.280868 

13 

894.577 

0.34866 

1 

| ****** * | 

0.292280 

14 

464.059 

0.18087 

1 

|** * * | 

0.298423 

15 

78.980204 

0.03078 

1 

* ■ ! 

0.300055 




Inverse Autocorrelations 



Lag 

Correlation 

-1 

987654321 01 234567891 



1 

-0.48370 

1 

**********1 1 

1 ■ 1 



2 

-0.02286 

1 

1 ■ 1 



3 

-0.00329 

1 

1 ■ 1 



4 

0.01486 

1 

1 ■ 1 



5 

0.01432 

1 

1 ■ 1 



6 

-0.02317 

1 

1 ■ 1 



7 

0.04451 

1 

1 * 1 



8 

-0.04284 

1 

* 1 1 



9 

0.01813 

1 

1 ■ i 



10 

0.00084 

1 

1 ■ 1 



11 

0.10220 

1 

1 ** 1 

■ 1 ■ 1 



12 

-0.23143 

1 

***** 1 1 

1 ■ 1 



13 

0.07845 

1 

1 ** 1 

■ 1 ■ 1 



14 

0.02590 

1 

1 * 1 

■ 1 ■ 1 



15 

0.03720 

1 

1 * 1 

■ 1 ■ 1 
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Output 4.2 Computing the ACF with the IDENTIFY Statement: PROC ARIMA (continued) 

Partial Autocorrelations 

Lag Correlation -1 987654321 01 234567891 

1 0.86275 

2 -0.24922 

3 -0.05442 

4 -0.03252 

5 -0.00610 

6 0.11891 

7 0.08462 

8 0.17698 

9 0.06457 

10 0.13938 

11 0.11326 

12 -0.05743 

13 -0.46306 

14 -0.11123 

15 0.03114 


Autocorrelation Check for White Noise 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- 



6 

119.03 

6 

<.0001 

0.863 

0.681 

0.505 

0.354 

0.233 

0.173 

12 

175.54 

12 

<.0001 

0.163 

0.211 

0.278 

0.360 

0.439 

0.468 

18 

199.05 

18 

<.0001 

0.349 

0.181 

0.031 

-0.095 

-0.196 

-0.248 

24 

215.31 

24 

<.0001 

-0.261 

-0.227 

-0.177 

-0.111 

-0.040 

-0.004 

30 

296.18 

30 

<.0001 

-0.078 

-0.193 

-0.284 

-0.368 

-0.436 

-0.468 

36 

376.88 

36 

<.0001 

-0.463 

-0.414 

-0.349 

-0.259 

-0.157 

-0.082 


Name of Variable = CONSTRCT 


Period(s) of Differencing 1 
Mean of Working Series 2.113636 
Standard Deviation 19.56132 
Number of Observations 66 
Observation(s) eliminated by differencing 1 
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Output 4.2 Computing the ACF with the IDENTIFY Statement: PROC ARIMA (continued) 






Autocorrelations 








0 



Lag 

Covariance 

Correlation 

-1 

9 8 

76543210123456789 

1 

Std Error 

0 

382.645 

1.00000 

1 


|********************| 

0 

1 

150.101 

0.39227 

1 


1 ******** 

1 

0.123091 

2 

34.261654 

0.08954 

1 


1 * * 

1 

0.140764 

3 

-17.049788 

-.04456 

1 


* 

1 

0.141624 

4 

-50.307780 

- .13147 

1 


* * * 1 

1 

0.141836 

5 

-104.359 

-.27273 

1 


***** 1 

1 

0.143671 

6 

-113.459 

-.29651 

1 


****** 1 

1 

1 

0.151312 

7 

-104.599 

-.27336 

1 


***** 1 

1 

0.159874 

8 

-53.315553 

- .13933 

1 


* * * 1 

1 

0.166805 

9 

-34.163118 

-.08928 

1 


* * 1 

1 

0.168559 

10 

-5.892301 

-.01540 

1 



1 

0.169274 

11 

104.746 

0.27374 

1 


1 ***** 

1 

0.169296 

12 

258.268 

0.67495 

1 


1 ************* 

■ 1 

1 

0.175874 

13 

114.671 

0.29968 

1 


1 ****** 

1 

0.211510 

14 

31.767495 

0.08302 

1 


1 * * 

1 

0.217849 

15 

-9.281516 

-.02426 

1 



1 

0.218328 




Inverse 

Autocorrelations 




Lag 

Correlation 

-1 

9 8 

76543210123456789 

1 



1 

-0.22458 

1 


* * * * 1 

■ 1 

1 



2 

-0.08489 

1 


* * 1 

■ 1 

1 



3 

-0.02789 

1 


* 1 

■ 1 

1 



4 

0.08315 

1 


1 * * 

■ 1 

1 



5 

0.02735 

1 


1 * 

1 



6 

-0.02083 

1 


1 

1 



7 

0.07075 

1 


1 * 

■ 1 

1 



8 

-0.07942 

1 


* * 1 

■ 1 

1 



9 

0.06891 

1 


1 * 

■ 1 

1 



10 

0.06671 

1 


1 * 

■ 1 

1 



11 

0.07826 

1 


1 * * 

1 



12 

-0.44727 

1 


********* 1 

1 

1 



13 

0.06835 

1 


1 * 

■ 1 

1 



14 

0.08653 

1 


1 * * 

■ 1 

1 



15 

-0.04643 

1 


* 1 

* 1 

1 





Partial 

Autocorrelations 




Lag 

Correlation 

-1 

9 8 

76543210123456789 

1 



1 

0.39227 

1 


1 ******** 

■ 1 

1 



2 

-0.07604 

1 


* * 1 

■ 1 

1 



3 

-0.06244 

1 


* I 

■ 1 

1 



4 

-0.10017 

1 


* * 1 

■ 1 

1 



5 

-0.21696 

1 


* * * * I 

■ 1 

1 



6 

-0.14352 

1 


* * * 1 

■ 1 

1 



7 

-0.15335 

1 


* * * 1 

■ 1 

1 



8 

-0.03507 

1 


* 1 

■ 1 

1 



9 

-0.11707 

1 


* * 1 

■ 1 

1 



10 

-0.06943 

1 


* 1 

■ 1 

1 



11 

0.23978 

1 


1 ***** 

■ 1 

1 



12 

0.56027 

1 


1 *********** 

■ 1 

1 



13 

-0.20675 

1 


* * * * 1 

1 



14 

-0.05794 

1 


* 

1 



15 

-0.00776 

1 



1 
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Output 4.2 Computing the ACF with the IDENTIFY Statement: PROC ARIMA (continued) 




Autocorrelation Check for 

White Noise 





To 

Chi- 

Pr > 










Lag 

Square 

DF ChiSq 




—Autocorrelations- - 


- - - - 


6 

24.63 

6 0.0004 


0.392 

0.090 

-0.045 

-0.131 

-0 

273 

-0.297 

12 

76.44 

12 <.0001 


0.273 

-0.139 

-0.089 

-0.015 

0 

274 

0.675 

18 

93.66 

18 <.0001 


0.300 

0.083 

-0.024 

-0.071 

-0 

202 

-0.226 

24 

124.70 

24 <.0001 


0.211 

-0.124 

-0.081 

-0.058 

0 

181 

0.442 

30 

137.06 

30 <.0001 


0.214 

0.091 

-0.022 

-0.016 

-0 

148 

-0.172 

36 

171 .23 

36 <.0001 


0.224 

-0.161 

-0.117 

-0.125 

0 

140 

0.341 




Name 

of Variable = CONSTRCT 







Period(s) of 

Differencing 




1.12 






Mean of Working 

Series 




-1.70926 






Standard Deviation 




9.624434 






Number of Observations 




54 






Observation^ 

) eliminated 

by differencing 

13 








Autocorrelations 






Lag 

Covariance 

Correlation 


1 9 8 7 6 

5 4 3 

2 10 12 3 

4 5 6 7 8 

9 1 


Std Error 

0 

92.629729 

1 .00000 





kkkkkkkkkkkkkkkkkkkk 


0 

1 

-4.306435 

- .04649 




k 





0.136083 

2 

1.797140 

0.01940 









0.136377 

3 

-0.505176 

- .00545 









0.136428 

4 

-12.210895 

- .13182 




k k k 





0.136432 

5 

-4.617964 

- .04985 




* 





0.138770 

6 

-9.764872 

- .10542 




k k 





0.139102 

7 

1.989406 

0.02148 









0.140573 

8 

-6.583051 

- .07107 




k 





0.140634 

9 

1.441982 

0.01557 









0.141298 

10 

5.785042 

0.06245 





k 




0.141329 

11 

1.738408 

0.01877 









0.141840 

12 

-10.793891 

- .11653 




k k 





0.141886 

13 

-7.469517 

- .08064 




k k 





0.143647 

14 

7.038939 

0.07599 





k k 




0.144483 

15 

7.022723 

0.07582 





k k 




0.145221 
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Output 4.2 Computing the ACF with the IDENTIFY Statement: PROC ARIMA (continued) 






Inverse 

Autocorrelations 






Lag 

Correlation 

-19 8 

765432101234567 

8 

9 

1 



1 


0.13117 

1 

1 * * * 



1 



2 


-0.09234 

1 

* * 1 



1 



3 


-0.03146 

1 

* 1 



1 



4 


0.11189 

1 

1 * * 



1 



5 


0.11239 

1 

1 * * 



1 



6 


0.12587 

1 

1 * * * 



1 



7 


0.05514 

1 

1 * 



1 



8 


0.07675 

1 

1 * * 



1 



9 


-0.03391 

1 

* 1 



1 



10 


-0.04485 

1 

* 1 



1 



11 


0.09691 

1 

1 * * 



1 



12 


0.22039 

1 

1 * * * * 



1 



13 


0.08367 

1 

1 * * 



1 



14 


-0.09861 

1 

* * 1 



1 



15 


-0.07616 

1 

* * 1 



1 






Partial 

Autocorrelations 






Lag 

Correlation 

-19 8 

765432101234567 

8 

9 

1 



1 


-0.04649 

1 

* 1 



1 



2 


0.01728 

1 

1 



1 



3 


-0.00377 

1 

1 



1 



4 


-0.13291 

1 

* * * 1 



1 



5 


-0.06308 

1 

* 1 



1 



6 


-0.10861 

1 

* * 1 



1 



7 


0.00986 

1 

1 



1 



8 


-0.08828 

1 

* * 1 



1 



9 


-0.01180 

1 

1 



1 



10 


0.03282 

1 

1 * 



1 



11 


0.01461 

1 

1 



1 



12 


-0.15265 

1 

* * * I 



1 



13 


-0.10793 

1 

* * 1 



1 



14 


0.06736 

1 

1 * 



1 



15 


0.10476 

1 

1 * * 



1 


To 

© 

Chi- 


Autocorrelation Check for White Noise 

Pr > 





Lag 

Square 

DF 

ChiSq 


.Autocorrelations 

- 

- 



6 

2.05 

6 

0.9149 

-0.046 

0.019 -0.005 -0.132 



-0.050 

-0.105 

12 

3.70 

12 

0.9883 

0.021 

-0.071 0.016 0.062 



0.019 

-0.117 

18 

5.97 

18 

0.9963 

-0.081 

0.076 0.076 -0.037 



-0.090 

0.040 

24 

19.59 

24 

0.7196 

0.104 

0.084 -0.011 0.045 



-0.249 

-0.240 

30 

26.63 

30 

0.6424 

0.135 

-0.182 0.051 0.053 



0.055 

0.066 

36 

28.38 

36 

0.8134 

-0.017 

-0.027 0.021 -0.038 



0.084 

-0.035 
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To forecast the seasonal data, use the following statements: 

PROC ARIMA DATA= CONST; 

IDENTIFY VAR=CONSTRCT(1,12) NOPRINT; 

ESTIMATE NOCONSTANT METHOD=ML; 

FORECAST LEAD=12 INTERVAL=MONTH ID=DATE OUT=OUTF; 
RUN; 


The results are shown in Output 4.3. 

The model 

(l-B)(l-B 12 )Y r =(l-0 1 B)(l-0 1 B 12 )^ 

is known as the airline model. Its popularity started when Box and Jenkins (1976) used it to model 
sales of international airline tickets on a logarithmic scale. Output 4.4 shows plots of the original 
and log scale data from Box and Jenkins’s text. 


Output 4.3 Forecasting Seasonal Data with the IDENTIFY, ESTIMATE, and FORECAST Statements: 
PROC ARIMA 


CONSTRUCTION REVIEW 
CONSTRUCTION WORKERS IN THOUSANDS 

The ARIMA Procedure 


Variance Estimate 95.5513 
Std Error Estimate 9.775034 
AIC 399.4672 
SBC 399.4672 
Number of Residuals 54 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- 



6 

0.93 

6 

0.9880 

-0.019 

0.048 

0.031 

-0.089 

-0.011 

-0.061 

12 

2.62 

12 

0.9977 

0.057 

-0.033 

0.052 

0.093 

0.050 

-0.079 

18 

5.22 

18 

0.9985 

-0.046 

0.108 

0.102 

-0.003 

-0.054 

0.078 

24 

16.49 

24 

0.8695 

0.135 

0.112 

0.020 

0.072 

-0.209 

-0.195 


Model for Variable CONSTRCT 
Period(s) of Differencing 1,12 
No mean term in this model. 
Forecasts for Variable CONSTRCT 


Obs 

Forecast 

Std Error 

95% Confidence 

Limits 

68 

588.9000 

9.7750 

569.7413 

608.0587 

69 

585.0000 

13.8240 

557.9055 

612.0945 

70 

574.6000 

16.9309 

541.4161 

607.7839 

[more 

output lines] 




77 

529.5000 

30.9114 

468.9148 

590.0852 

78 

532.8000 

32.4201 

469.2577 

596.3423 

79 

543.6000 

33.8617 

477.2323 

609.9677 
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Output 4.4 

Plotting the 
Original and 
Log 

Transformed 
Box and 
Jenkins Airline 
Data 


SERIES G 

INTERNATIONAL AIRLINES PASSENGERS 
ORIGINAL SCALE 

AIR 



DATE 


SERIES G 

INTERNATIONAL AIRLINES PASSENGERS 
LOG SCALE 


LAIR 



DATE 
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Now analyze the logarithms, which have the more stable seasonal pattern, using these SAS 
statements: 

PROC ARIMA DATA=AIRLINE; 

IDENTIFY VAR=LAIR; 

IDENTIFY VAR=LAIR(1); 

TITLE 'SERIES G'; 

TITLE2 'INTERNATIONAL AIRLINES PASSENGERS'; 

RUN; 

The results are shown in Output 4.5, It is hard to detect seasonality in the ACF of the original 
series O because all the values are so near 1. The slow decay is much more evident here than in the 
construction example. Once you take the first difference, you obtain the ACF ©. Looking at the 
seasonal lags (12,24), you see little decay, indicating you should consider a span 12 difference. To 
create the variable 

v, = (y,-y,_,)-(y,_ 12 -y,_ 13 ) 

and its ACF, inverse autocorrelation function (IACF), and partial autocorrelation function (PACF), 
issue the following SAS statements: 

PROC ARIMA DATA=AIRLINE; 

IDENTIFY VAR=LAIR(1,12); 

The model is identified from the autocorrelations. Identification depends on pattern recognition in the 
plot of the ACF values against the lags. The nonzero ACF values are called spikes to draw to mind 
the plots PROC ARIMA produces in the IDENTIFY stage. For the airline model, if 9j > 0 and 
0 2 > 0, the theoretical autocorrelations of the series 

v, = (i-e 1 B)(i-e 2 B 12 )^ 

should have 

□ a (negative) spike at lag 1 

□ a (negative) spike at lag 12 

□ equal (and positive) spikes at lags 11 and 13 called side lobes of the lag 12 spike 

□ all other lag correlations 0. 
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Output 4.5 Identifying the Logarithms with the IDENTIFY Statement: PROC ARIMA 





SERIES G 




INTERNATIONAL AIRLINES PASSENGERS 





The ARIMA Procedure 





Name of Variable = LAIR 




Mean 

of Working Series 5.542176 




Standard Deviation 0.439921 

Number of Observations 144 





Autocorrelations 





O 


Lag 

Covariance 

Correlation 

-1 987654321 01 234567891 

Std Error 

0 

0.193530 

1.00000 

1 1 ******************** 1 

0 

1 

0.184571 

0.95370 

1 1 ******************* 1 

0.083333 

2 

0.173968 

0.89892 

1 1 ****************** 1 

0.139918 

3 

0.164656 

0.85080 

1 1 ***************** 1 

0.175499 

4 

0.156455 

0.80843 

1 1 **************** 1 

0.202123 

5 

0.150741 

0.77890 

1 1 **************** 1 

0.223452 

6 

0.146395 

0.75644 

1 1 *************** 1 

0.241572 

7 

0.142748 

0.73760 

1 1 *************** 1 

0.257496 

8 

0.140722 

0.72713 

1 1 *************** 1 

0.271773 

9 

0.141983 

0.73365 

1 1 *************** 1 

0.284963 

10 

0.144036 

0.74426 

I 1 *************** 1 

0.297791 

11 

0.146701 

0.75803 

I 1 *************** 1 

0.310440 

12 

0.147459 

0.76194 

1 1 *************** 1 

0.323038 

13 

0.138665 

0.71650 

1 1 ************** 1 

0.335286 

14 

0.128319 

0.66304 

1 1 ************* 1 

0.345756 

15 

0.119672 

0.61836 

1 1 ************ | 

0.354475 




Inverse Autocorrelations 



Lag 

Correlation 

-1 987654321 01 234567891 



1 

-0.50387 

1 **********1 1 



2 

-0.00329 




3 

0.02976 

1 . |* . | 



4 

-0.01239 




5 

-0.02237 




6 

0.01070 




7 

-0.00970 




8 

0.01922 




9 

0.01018 




10 

-0.00820 




11 

0.15344 

1 1 * * * 1 



12 

-0.34183 

1 ******* 1 | 



13 

0.14016 

I 1 * * * 1 



14 

0.07699 

1 ■ I**- 1 



15 

-0.06626 

1 . * | . | 
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Output 4.5 Identifying the Logarithms with the IDENTIFY Statement: PROC ARIMA (continued) 





Partial 

Autocorrelations 





Lag 

Correlation 


1 9 8 

765432101234567891 




1 


0.95370 




******************* 




2 


-0.11757 



* * 





3 


0.05423 




* 




4 


0.02376 








5 


0.11582 




* * 




6 


0.04437 




* 




7 


0.03803 




* 




8 


0.09962 




* * 




9 


0.20410 




* * * * 




10 


0.06391 




* 




11 


0.10604 




* * 




12 


-0.04247 



* 





13 


-0.48543 



********** 





14 


-0.03435 



* 





15 


0.04222 




* 






Autocorrelation 

Check for White Noise 



To 

Chi- 


Pr > 







Lag 

Square 

DF 

ChiSq 



.Autocorrelations. 

" ' - - 


6 

638.37 

6 

<.0001 


0.954 

0.899 0.851 0.808 0 

779 

0.756 

12 

1157.62 

12 

<.0001 


0.738 

0.727 0.734 0.744 0 

758 

0.762 

18 

1521.94 

18 

<.0001 


0.717 

0.663 0.618 0.576 0 

544 

0.519 

24 

1785.32 

24 

<.0001 


0.501 

0.490 0.498 0.506 0 

517 

0.520 





Name of 

Variable = LAIF 






Period(s) of 

Differencing 

1 





Mean 

of Working 

Series 


0.00944 





Standard Deviation 


0.106183 





Number of Observations 


143 





Observation^ 

) eliminated by differencing 1 








Autocorrelations 











© 



Lag 

Covariance 

Correlation 


1 9 8 

765432101234567891 


Std Error 

0 

0.011275 


1 .00000 




******************** 


0 

1 

0.0022522 


0.19975 




* * * * 


0.083624 

2 

-0.0013542 


- .12010 



* * 



0.086897 

3 

-0.0016999 


- .15077 



* * * 



0.088050 

4 

-0.0036313 


-.32207 



****** 



0.089837 

5 

-0.0009468 


- .08397 



* * 



0.097578 

6 

0.00029065 


0.02578 




* 


0.098082 

7 

-0.0012511 


- .11096 



* * 



0.098130 

8 

-0.0037965 


- .33672 



******* 



0.099003 

9 

-0.0013032 


- .11559 



* * 



0.106712 

10 

-0.0012320 


- .10927 



* * 



0.107584 

11 

0.0023209 


0.20585 




* * * * 


0.108357 

12 

0.0094870 


0.84143 




***************** 


0.111058 

13 

0.0024251 


0.21509 




* * * * 


0.149118 

14 

-0.0015734 


- .13955 



* * * 



0.151272 

15 

-0.0013078 


- .11600 



* * 



0.152169 
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Output 4.5 Identifying the Logarithms with the IDENTIFY Statement: PROC ARIMA (continued) 




Inverse 

Autocorrelations 





Lag 

Correlation 

-19 8 

76543210123456 

7 

8 

9 

1 

1 

0.17650 

1 

1 * * * * 

■ 1 




1 

2 

0.09526 

1 

1 * * 




1 

3 

0.34062 

1 

1 ******* 

* 1 




1 

4 

0.28364 

1 

1 ****** 

■ 1 




1 

5 

0.05975 

1 

■ 1* ■ 




1 

6 

0.13322 

1 

1 * * * 

* 1 




1 

7 

0.24104 

1 

1 ***** 

* 1 




1 

8 

0.11930 

1 

1 * * 




1 

9 

-0.04769 

1 

■ * 1 ■ 




1 

10 

0.19042 

1 

1 * * * * 

■ 1 




1 

11 

0.10362 

1 

1 * * 




1 

12 

-0.27362 

1 

***** 1 




1 

13 

0.02062 

1 





1 

14 

0.15054 

1 

1 * * * 

■ 1 




1 

15 

-0.03827 

1 

■ * 1 ■ 




1 



Partial 

Autocorrelations 





Lag 

Correlation 

-19 8 

76543210123456 

7 

8 

9 

1 

1 

0.19975 

1 

1 * * * * 




1 

2 

-0.16665 

1 

* * * 1 

1 




1 

3 

-0.09588 

1 

* * 1 




1 

4 

-0.31089 

1 

****** 1 

1 




1 

5 

0.00778 

1 





1 

6 

-0.07455 

1 

■ * 1 ■ 




1 

7 

-0.21028 

1 

* * * * 1 

1 




1 

8 

-0.49476 

1 

********** 1 

1 




1 

9 

-0.19229 

1 

* * * * 1 

1 




1 

10 

-0.53188 

1 

*********** 1 

1 




1 

11 

-0.30229 

1 

****** 1 

1 




1 

12 

0.58604 

1 

1 ************ 




1 

13 

0.02598 

1 

■ 1* ■ 




1 

14 

-0.18119 

1 

* * * * 1 

1 




1 

15 

0.12004 

1 

1 * * 




1 
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Output 4.5 Identifying the Logarithms with the IDENTIFY Statement: PROC ARIMA (continued) 




Autocorrelation Check for White Noise 




To 

Chi- 

Pr > 







Lag 

Square 

DF ChiSq 

.Autocorrelations- - 


" - - - 


6 

27.95 

6 <.0001 

0.200 -0.120 -0.151 

-0.322 

-0 

084 

0.026 

12 

169.89 

12 <.0001 

0.111 -0.337 -0.116 

-0.109 

0 

206 

0.841 

18 

195.75 

18 <.0001 

0.215 -0.140 -0.116 

-0.279 

-0 

052 

0.012 

24 

321.53 

24 <.0001 

0.114 -0.337 -0.107 

-0.075 

0 

199 

0.737 




SERIES G 








INTERNATIONAL AIRLINES PASSENGERS 








The ARIMA Procedure 








Name of Variable = LAIF 








Period(s) of Differencing 


1.12 






Mean of Working 

Series 


0.000291 






Standard Deviation 


0.045673 






Number of Observations 


131 






Observation(s) eliminated by differencing 

13 







Autocorrelations 






Lag 

Covariance 

Correlation 

19876543210123 

4 5 6 7 8 

9 1 


Std Error 

0 

0.0020860 

1 .00000 


******************** 


0 

1 

-0.0007116 

- .34112 

******* 





0.087370 

2 

0.00021913 

0.10505 


* * 




0.097006 

3 

-0.0004217 

-.20214 

* * * * 





0.097870 

4 

0.00004456 

0.02136 






0.101007 

5 

0.00011610 

0.05565 


* 




0.101042 

6 

0.00006426 

0.03080 


* 




0.101275 

7 

-0.0001159 

- .05558 

* 





0.101347 

8 

-1.5867E-6 

- .00076 






0.101579 

9 

0.00036791 

0.17637 


* * * * 




0.101579 

10 

-0.0001593 

- .07636 

* * 





0.103891 

11 

0.00013431 

0.06438 


* 




0.104318 

12 

-0.0008065 

-.38661 

******** 





0.104621 

13 

0.00031624 

0.15160 


* * * 




0.115011 

14 

-0.0001202 

-.05761 

* 





0.116526 

15 

0.00031200 

0.14957 


* * * 




0.116744 
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Output 4.5 Identifying the Logarithms with the IDENTIFY Statement: PROC ARIMA (continued) 






Inverse 

Autocorrelations 







Lag 

Correlation 


1 9 8 

765432101234567 

8 

9 

1 




1 


0.32632 




******* 







2 


0.09594 




* * 







3 


0.09992 




* * 







4 


0.10889 




* * 







5 


-0.12127 



* * 








6 


-0.16601 



* * * 








7 


-0.05979 



* 








8 


0.02949 




* 







9 


-0.08480 



* * 








10 


0.01413 











11 


0.10508 




* * 







12 


0.37985 




******** 







13 


0.12446 




* * 







14 


0.05655 




* 







15 


0.05144 




* 










Partial 

Autocorrelations 







Lag 

Correlation 


1 9 8 

765432101234567 

8 

9 

1 




1 


-0.34112 



******* 








2 


-0.01281 











3 


-0.19266 



* * * * 








4 


-0.12503 



* * * 








5 


0.03309 




* 







6 


0.03468 




* 







7 


-0.06019 



* 








8 


-0.02022 











9 


0.22558 




***** 







10 


0.04307 




* 







11 


0.04659 




* 







12 


-0.33869 



******* 








13 


-0.10918 



* * 








14 


-0.07684 



* * 








15 


-0.02175 













Autocorrelation Check for White Noise 






To 

Chi- 


Pr > 










Lag 

Square 

DF 

ChiSq 



.Autocorrelations 

- 

- 

- 



6 

23.27 

6 

0.0007 


0.341 

0.105 -0.202 0.021 



0 

056 

0.031 

12 

51 .47 

12 

<.0001 


0.056 

-0.001 0.176 -0.076 



0 

064 

-0.387 

18 

62.44 

18 

<.0001 


0.152 

-0.058 0.150 -0.139 



0 

070 

0.016 

24 

74.27 

24 

<.0001 


0.011 

-0.117 0.039 -0.091 



0 

223 

-0.018 


The pattern that follows represents the ACF of 

V, =(i-0 1 b)(i-9 2 b 12 )^ 

or the IACF of 

(l-0 1 B)(l-e 2 B 12 )v r =e, 
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1 * 

★ 

★ 

★ 

★ 

2-3-4-5-6-7-8-9-0-1-2-3-4-5-6—> Lag 

★ 

★ 

★ 

★ 


★ ★ 

★ ★ 

+-1-2-3-4-5-6-7-8-9-0-1- 

★ 

★ 

★ 


When you compare this pattern to the ACF of the LAIR(1,12) variable, you find reasonable 
agreement. If the signs of the parameters are changed, the spikes and side lobes have different signs 
but remain at the same lags. The spike and side lobes at the seasonal lag are characteristic of seasonal 
multiplicative models. Note that if the multiplicative factor is on the AR side, this pattern appears in 
the IACF instead of in the ACF. In that case, the IACF and PACF behave differently and the IACF is 
easier to interpret. 

If the model is changed to 

V r -«V M2 =(l-e i B)(l-e 2 B 12 )e r 

the spike and side lobes are visible at the seasonal lag (for example, 12) and its multiples (24, 36, and 
so on), but the magnitudes of the spikes at the multiples decrease exponentially at rate a. If the 
decay is extremely slow, an additional seasonal difference is needed (a = l). If the pattern appears in 
the IACF, the following model is indicated: 

(l-0 1 B)(l-e 2 B 12 )v r = (l-«B 12 )e, 

The SAS code for the airline data is 

PROC ARIMA DATA=AIRLINE; 

IDENTIFY VAR=LAIR(1,12) NOPRINT; 

ESTIMATE Q=(1)(12) NOCONSTANT; 

FORECAST LEAD=12 OUT=FORE ID=DATE INTERVAL=MONTH; 

RUN; 

PROC GPLOT DATA=FORE(FIRSTOBS=120); 

PLOT (LAIR FORECAST L95 U95)*DATE / OVERLAY HMINOR=0; 

SYMBOL1 V=A L=1 I=JOIN C=BLACK; 

SYMBOL2 V=F L=2 I=JOIN C=BLACK; 

SYMBOL3 V=L C=BLACK I=NONE; 

SYMBOL4 V=U C=BLACK I=NONE; 

RUN; 

DATA FORE; 

SET FORE; 

IF RESIDUAL NE .; 

RUN; 

PROC SPECTRA P WHITETEST DATA=FORE OUT=RESID; 

VAR RESIDUAL; 

RUN; 

PROC GPLOT DATA=RESID; 

PLOT P_01*FREQ/HMINOR=0; 

SYMBOL1 F=TRIPLEX V=* I=JOIN C=BLACK; 

RUN; 
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The results are shown in Output 4.6 and Output 4.7. 


Output 4.6 Fitting the Airline Model: PROC ARIMA 


SERIES G 

INTERNATIONAL AIRLINES PASSENGERS 
The ARIMA Procedure 


Conditional Least Squares Estimation 




Standard 


Approx 


Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

MAI ,1 

0.37727 

0.08196 

4.60 

<.0001 

1 

MA2.1 

0.57236 

0.07802 

7.34 

<.0001 

12 


Variance 

Estimate 

0.00141 

Std Error 

Estimate 

0.037554 

AIC 


-486.133 

SBC 


-480.383 

Number of 

Residuals 

131 


* AIC and SBC do not include log determinant. 

Correlations of Parameter 
Estimates 

Parameter MAI,1 MA2,1 

MAI , 1 1.000 -0.091 

MA2,1 -0.091 1.000 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- ■ 



6 

5.15 

4 

0.2723 

0.010 

0.028 

-0.119 

-0.100 

0.081 

0.077 

12 

7.89 

10 

0.6400 

-0.049 

-0.023 

0.114 

-0.045 

0.025 

-0.023 

18 

11.98 

16 

0.7452 

0.012 

0.036 

0.064 

-0.136 

0.055 

0.011 

24 

22.56 

22 

0.4272 

-0.098 

-0.096 

-0.031 

-0.021 

0.214 

0.013 


Model for Variable LAIR 
Period(s) of Differencing 1,12 
No mean term in this model. 


Moving Average Factors 

- 0.37727 B**(1) 

- 0.57236 B**(12) 


Factor 1: 1 

Factor 2: 1 
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Output 4.6 Fitting the Airline Model: PROC ARIMA (continued) 


SERIES G 

INTERNATIONAL AIRLINES PASSENGERS 

The ARIMA Procedure 
Forecasts for Variable LAIR 

Obs Forecast Std Error 95% Confidence Limits 

145 6.1095 0.0376 6.0359 6.1831 

146 6.0536 0.0442 5.9669 6.1404 

147 6.1728 0.0500 6.0747 6.2709 

[more output lines] 

154 6.2081 0.0796 6.0521 6.3641 

155 6.0631 0.0829 5.9005 6.2256 

156 6.1678 0.0862 5.9989 6.3367 

SERIES G 

INTERNATIONAL AIRLINES PASSENGERS 

The SPECTRA Procedure 

Test for White Noise for Variable RESIDUAL 

M = 65 

Max(P(*)) 0.0102 

Sum(P(*)) 0.181402 


Fisher's Kappa: M*MAX(P(*))/SUM(P(*)) 

Kappa 3.655039 

Bartlett's Kolmogorov-Smirnov Statistic: 
Maximum absolute difference of the standardized 
partial sums of the periodogram and the CDF of a 
uniform(0,1) random variable. 

Test Statistic 0.089019 

Approximate P-Value 0.6816 
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Output 4 .7 

Plotting the 
Forecasts 
and the 
Periodogram: 
PROC ARIMA 
and PROC 
SPECTRA 


FORECASTS 


LAIR 



DATE 


PERIODOGRAM OF RESIDUAL 

p_01 
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PROC SPECTRA is also used to search for hidden periodicities in the airline residuals. No 
periodicities are indicated in the periodogram plot or in the white noise tests produced by PROC 
SPECTRA. Refer to Chapter 7, “Spectral Analysis,” for more information on PROC SPECTRA. 


4.2 Models with Explanatory Variables 

Sometimes you can improve forecasts by relating the series of interest to other explanatory variables. 
Obviously, forecasting in such situations requires knowledge (or at least forecasts) of future values of 
such variables. The nature of the explanatory variables and of the model relating them to the target 
series determines the optimal forecasting method. Explanatory variables are addressed in Chapter 2, 
“Simple Models: Autoregression.” There, they are deterministic, meaning that their future values are 
determined without error. Seasonal indicator variables and time t are deterministic. Explanatory 
variables like interest rates and unemployment are not deterministic because their future values are 
unknown. 

Chapter 2 assumes that the relationship between the target series Y and the explanatory series X lf , 

X , . . . , X ki satisfies the usual regression model assumptions 

Y,=P 0+ P 1 X 1 ,+. . . + V k X kt+ e, 

where e t is white noise. The Durbin-Watson statistic is used in Chapter 2 to detect departures from 
the assumptions on e t . The following methods are appropriate when the Durbin-Watson statistic from 
PROC REG or PROC GLM shows significant autocorrelation. Recall that if the regression analysis 
from PROC REG or PROC GLM shows no autocorrelation and if known future values (as opposed 
to forecasts) are available for all Xs, you can forecast with appropriate prediction intervals by 

□ supplying future Xs and missing values (.) for future Ys 

□ regressing Y on the Xs with the CLI option in the MODEL statement or the 
keywords U95=, L95= in the OUTPUT statement. 

This chapter combines regression with time series errors to provide a richer class of forecasting 
models. Three cases are delineated below, presented in order of increasing complexity. Examples are 
included, and special cases are highlighted. 


4.2.1 Case 1: Regression with Time Series Errors 

The model is 

Y,=P 0 + P 1 X 1 ,+P 2 X 2 , + ... + P i X fe + Z, 

where Z f is an ARIMA time series. This is a typical regression except that you allow for 
autocorrelation in the error term Z. The Y series does not depend on lagged values of the Xs. If the 
error series is purely autoregressive of order p, the SAS code 

PROC AUTOREG DATA=EXAMP; 

MODEL Y=X1 X2 X3 / NLAG=P; 

RUN; 
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properly fits a model to k =3 explanatory variables. Because PROC ARIMA can do this and can also 
accommodate mixed models and differencing, it is used instead of PROC AUTOREG in the analyses 
below. 

In case 1, forecasts of Y and forecast intervals are produced whenever future values of the Xs are 
supplied. If these future Xs are user-supplied forecasts, the procedure cannot incorporate the 
uncertainty of these future Xs into the intervals around the forecasts of Y. Thus, the Y forecast 
intervals are too narrow. Valid intervals are produced when you supply future values of deterministic 
Xs or when PROC ARIMA forecasts the Xs in a transfer function setting as in cases 2 and 3. 


4.2.2 Case 1A: Intervention 

If one of the X variables is an indicator variable (each value 1 or 0), the modeling above is called 
intervention analysis. The reason for this term is that X usually changes from 0 to 1 during periods of 
expected change in the level of Y, such as strikes, power outages, and war. For example, suppose Y is 
the daily death rate from automobile accidents in the United States. Suppose that on day 50 the speed 
limit is reduced from 65 mph to 55 mph. Suppose you have another 100 days of data after this 
intervention. In that case, designate X as 0 before day 50 and as 1 on and following day 50. The model 

Y, = P„ + P,X, +Z, 

explains Y in terms of two means (plus the error term). Before day 50 the mean is 

p. + (p,)(°) = p. 

and on and following day 50 the mean is p n + . Thus, P : is the effect of a lower speed limit, and its 

statistical significance can be judged based on the t test for H 0 : P : =0. 

If the model is fit by ordinary regression but the Zs are autocorrelated, this t test is not valid. Using 
PROC ARIMA to fit the model allows a valid test; supplying future values of the deterministic X 
produces forecasts with valid forecast intervals. 

The Is and 0s can occur in any meaningful place in X. For example, if the speed limit reverts to 65 
mph on day 70, you set X back to 0 starting on day 70. 

If a data point is considered an outlier, you can use an indicator variable that is 1 only for that data 
point in order to eliminate its influence on the ARM A parameter estimates. Deleting the point results 
in a missing value (.) in the series; closing the gap with a DEFETE statement makes the lags across 
the gap incorrect. You can avoid these problems with the indicator variable approach. PROC 
ARIMA also provides an outlier detection routine. 


4.2.3 Case 2: Simple Transfer Function 

In this case, the model is 

Y f =P 0 +P 1 X f +Z f 

where X and Z f are independent ARIMA processes. Because X is an ARIMA process, you can 
estimate a model for X in PROC ARIMA and use it to forecast future Xs. The algorithm allows you 
to compute forecast error variances for these future Xs, which are automatically incorporated later 
into the Y forecast intervals. First, however, you must identify a model and fit it to the Z series. You 
accomplish this by studying the ACF, IACF, and PACF of residuals from a regression of Y on X. In 
fact, you can accomplish this entire procedure within PROC ARIMA. Once you have identified and 
fit models for X and Z, you can produce forecasts and associated intervals easily. 
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You can use several explanatory variables, but for proper forecasting they should be independent of 
one another. If the explanatory variables contain arbitrary correlations, use the STATESPACE 
procedure, which takes advantage of these correlations to produce forecast intervals. 


4.2.4 Case 3: General Transfer Function 

In case 3, you allow the target series Y to depend on current and past values of the explanatory 
variable X. The model is 

Y, = a + Z; = 0 P,X,_,+Z, 

where X and Z are independent ARIMA time series. Because it is impossible to fit an infinite number 
of unrestricted (3s to a finite data set, you restrict the (3s to have certain functional forms depending 
on only a few parameters. The appropriate form for a given data set is determined by an identification 
process for the (3s that is very similar to the usual identification process with the ACFs. Instead of 
inspecting autocorrelations, you inspect cross-correlations; but you are looking for the same patterns 
as in univariate ARIMA modeling. The (3s are called transfer function weights or impulse-response 
weights. 

You can use several explanatory Xs, but they should be independent of one another for proper 
forecasting and identification of the (3s. Even if you can identify the model properly, correlation 
among explanatory variables causes incorrect forecast intervals because the procedure assumes 
independence when it computes forecast error variances. 

Because you need forecasts of explanatory variables to forecast the target series, it is crucial that X 
does not depend on past values of Y. Such a dependency is called feedback. Feedback puts you in a 
circular situation where you need forecasts of X to forecast Y and forecasts of Y to forecast X. You 
can use PROC STATESPACE to model a series with arbitrary forms of feedback and cross- 
correlated inputs. Strictly AR models, including feedback, can be fit by multiple regression as proved 
by Fuller (1996). A general approach to AR modeling by nonlinear regression is also given by 
Fuller (1986). 


4.2.5 Case 3A: Leading Indicators 

Suppose in the model above you find that 

|3 0 = |3 1 = 0, |3 2 *0 

Then Y responds two periods later to movements in X. X is called a leading indicator for Y because 
its movements allow you to predict movements in Y two periods ahead. The lead of two periods is 
also called a shift or a pure delay in the response of Y to X. Such models are highly desirable for 
forecasting. 
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4.2.6 Case 3B: Intervention 

You can use an indicator variable as input in case 3B, as was suggested in case 1A. However, you 
identify the pattern of the (3s differently than in case 3. In case 3 cross-correlations are the key to 
identifying the (3 pattern, but in case 3B cross-correlations are virtually useless. 


4.3 Methodology and Example 


4.3.1 Case 1: Regression with Time Series Errors 

In this example, a manufacturer of building supplies monitors sales (S) for one of his product lines in 
terms of disposable income (D), U.S. housing starts (H), and mortgage rates (M). The data are 
obtained quarterly. Plots of the four series are given in Output 4.8. 

The first task is to determine the differencing desired. Each series has a fairly slowly decaying ACF, 
and you decide to use a differenced series. Each first differenced series has an ACF consistent with 
the assumption of stationarity. The D series has differences that display a slight, upward trend. This 
trend is not of concern unless you plan to model D. Currently, you are using it just as an explanatory 
variable. The fact that you differenced all the series (including sales) implies an assumption about the 
error term. Your model in the original levels of the variable is 

S, = P 0 + Pi D f +P 2 H f + P 3 M t +r lf 

When you lag by 1, you get 

S , =Po +P| D , +P 2 H m + P;M ( +11, 


When you subtract, you get 

VS, = 0 + P,VD, + P 2 VH, + P 3 VM, + Vtj, 
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Output 4.8 

Plotting 
Building- and 
Manufacturing- 
Related Quarterly 
Data 


SALES OF BUILDING SUPPLIES 


SALES 



DISPOSABLE PERSONAL INCOME 

INCOME 



DATE 
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Output 4.8 

Plotting 
Building- and 
Manufacturing- 
Related Quarterly 
Data (continued) 


U.S. HOUSING STARTS 

STARTS 



DATE 


MORTGAGE RATES 

RATES 
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Thus, differencing implies that ip had a unit root nonstationarity, so the differenced error series is 
stationary. This assumption, unlike assumptions about the explanatory series, is crucial. If you do not 
want to make this assumption, you can model the series in the original levels. Also, in the 
development above, you assume a simple intercept |3 0 that canceled out of the differenced model. If, 
in fact, a trend (3 0 + \\il is present, the differenced series has intercept \\r. If you had decided to fit the 
model in the original levels and to allow only AR error structures, PROC AUTOREG or Fuller's 
PROC NLIN method (1986) would have been an appropriate tool for the fitting. 

Assuming differencing is appropriate, your next task is to output the residuals from regression and to 
choose a time series model for the error structure Vrp. To accomplish this in PROC ARIMA, you 
must modify your IDENTIFY and ESTIMATE statements. The IDENTIFY statement is used to call 
in all explanatory variables of interest and to declare the degree of differencing for each. The 
CROSSCOR= option accomplishes this goal. You specify the following SAS statements: 

PROC ARIMA DATA=HOUSING; 

TITLE 'MODEL IN FIRST DIFFERENCES'; 

IDENTIFY VAR=SALES(1) CROSSCOR=(MORT(1) DPIC(l) STARTS(l)) NOPRINT; 

RUN; 


The NOPRINT option eliminates the printing of the cross-correlation function. Because you assume 
a contemporaneous relationship between sales and the explanatory variables, you do not check the 
cross-correlation function for dependence of sales on lagged values of the explanatory variables. If 
you want to check for lagged dependencies, you need to model the explanatory series to perform 
prewhitening. This is the only way you can get clear information from the cross-correlations. 

To run a regression of SALES(l) on MORT(l), DPIC(l), and STARTS(l), add the following 
statement to your PROC ARIMA code: 

ESTIMATE INPUT=(MORT DPIC STARTS) PLOT METHOD=ML; 

RUN; 


The INPUT= option denotes which variables in the CROSSCOR= list are to be used in the 
regression. Specifying differencing in the INPUT= option is not allowed. The order of differencing in 
the CROSSCOR= list is the order used. The PLOT option creates and plots the ACF, IACF, and 
PACF of the residuals. The results are shown in Output 4.9. 



Chapter 4: The ARIMA Model: Introductory Applications 171 


Output 4.9 Using the INPUT= Option of the ESTIMATE Statement to Run a Regression: PROC ARIMA 

The ARIMA Procedure 


Maximum Likelihood Estimation 




Standard 

© 

Approx 




Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

170.03857 

181.63744 

0.94 

0.3492 

0 

SALES 

0 

NUM1 

-151.07400 

112.12506 

-1.35 

0.1779 

0 

M0RT 

0 

NUM2 

-1.00212 

4.22489 

-0.24 

0.8125 

0 

DPIC 

0 

NUM3 

4.93009 

0.39852 

12.37 

<.0001 

0 

STARTS 

0 


Constant Estimate 

170.0386 

Variance Estimate 

200686.5 

Std Error Estimate 

447.9805 

AIC 

605.6806 

SBC 

612.4362 

Number of Residuals 

40 


Correlations of Parameter Estimates 


Variable 


SALES 

MORT 

DPIC 

STARTS 

Parameter 


MU 

NUM1 

NUM2 

NUM3 

SALES 

MU 

1 .000 

-0.028 

-0.916 

0.016 

MORT 

NUM1 

-0.028 

1 .000 

-0.071 

0.349 

DPIC 

NUM2 

-0.916 

-0.071 

1 .000 

-0.039 

STARTS 

NUM3 

0.016 

0.349 

-0.039 

1 .000 





Autocorrelation 

Check of 

Residuals 






© 










To 

Chi- 


Pr > 








Lag 

Square 

DF 

ChiSq . 


--Autocorrelations- - 


" - - - 


6 

8.90 

6 

0.1793 -0.397 

-0.028 

-0.012 

0.128 

-0 

088 

0.138 

12 

10.27 

12 

0.5923 -0.099 

0.035 

-0.033 

-0.099 

0 

053 

0.026 

18 

16.98 

18 

0.5245 -0.011 

-0.122 

0.143 

-0.189 

0 

014 

0.156 

24 

23.37 

24 

0.4981 -0.033 

-0.169 

0.174 

-0.043 

-0 

039 

0.093 




The ARIMA 

Procedure 









Autocorrelation 

Plot of 

Residuals 





Lag 

Covariance 

Correlation -1987 

6 5 4 3 2 

10 12 3 4 

5 6 7 8 

9 1 


Std Error 

0 

200687 


1.00000 | 



******************** 


0 

1 

-79669.125 


-.39698 © 

******** 





0.158114 

2 

-5682.473 


-.02832 


* 





0.181328 

3 

-2373.543 


-.01183 | 







0.181438 

4 

25596.562 


0.12754 



* * * 




0.181458 

5 

-17570.433 


-.08755 | 


* * 





0.183685 

6 

27703.666 


0.13804 



* * * 




0.184725 

7 

-19780.522 


-.09856 


* * 





0.187287 

8 

7007.339 


0.03492 



* 




0.188579 

9 

-6640.664 


-.03309 i 


* 





0.188741 

10 

-19897.518 


-.09915 


* * 





0.188886 
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Output 4.9 Using the INPUT= Option of the ESTIMATE Statement to Run a Regression: PROC ARIMA 
(continued) 




Inverse Autocorrelations © 

Lag 

Correlation 


1 987654321 01 234567891 

1 

0.50240 



********** 


2 

0.20274 



* * * * 


3 

-0.05699 


* 



4 

-0.23103 


***** 



5 

-0.18394 


* * * * 



6 

-0.15639 


* * * 



7 

0.03430 



* 


8 

0.11237 



* * 


9 

0.15648 



* * * 


10 

0.13664 



* * * 




Partial Autocorrelations 

Lag 

Correlation 


1 987654321 01 234567891 

1 

-0.39698 


******** 



2 

-0.22069 


* * * * 



3 

-0.14121 


* * * 



4 

0.07283 



* 


5 

-0.00035 





6 

0.16471 



* * * 


7 

0.03533 



* 


8 

0.02720 



* 


9 

-0.02400 





10 

-0.20475 


* * * * 






The ARIMA Procedure 



o 

Model for Variable SALES 



Estimated Intercept 

170.0386 


Period(s) of Differencing 

1 




Input Number 1 



© Input 

Variable 

MORT 


Period( 

) of Differencing 

1 


Overall 

Regression Factor 

-151.074 




Input Number 2 



© Input 

Variable 

DPIC 


Period( 

) of Differencing 

1 


Overall 

Regression Factor 

-1.00212 




Input Number 3 



© Input 

Variable 

STARTS 


Period ( 

) of Differencing 

1 


Overall 

Regression Factor 

4.930094 
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Output from the ESTIMATE statement for the sales data indicates that sales O are positively related 
to housing starts © but negatively related to mortgage rates © and disposable personal income ©. In 
terms of significance, only the t statistic © for housing starts exceeds 2. 

However, unless you fit the correct model, the t statistics are meaningless. The correct model 
includes specifying the error structure, which you have not yet done. For the moment, ignore these t 
statistics. You may argue based on the chi-square checks © that the residuals are not autocorrelated. 
However, because the first chi-square statistic uses six correlations, the influence of a reasonably 
large correlation at lag 1 may be lessened to such an extent by the other five small correlations that 
significance is lost. Look separately at the first few autocorrelations, and remember that differencing 
is often accompanied by an MA term. Thus, you fit a model to the error series and wait to judge the 
significance of your t statistics until all important variables (including lagged error values) have been 
incorporated into the model. You use the same procedure here as in regression settings, where you do 
not use the t statistic for a variable in a model with an important explanatory variable omitted. 

Based on the ACF of the differenced series, you fit an MA(1) model to the errors. You interpret the 
ACF of the differenced series as having a nonzero value (-0.39698) © at lag 1 and a near-zero value 
at the other lags. Also, check the I ACF © to see if you have overdifferenced the series. If you have, 
the IACF dies off very slowly. Suppose you decide the IACF dies off rapidly enough and that you 
were correct to difference. Note that if 

Y = a + pX + r| 

where X and r) are unit root processes, regression of Y on X produces an inconsistent estimate of p. 
This makes it impossible for you to use the PLOT option in a model in the original levels of the 
series to determine if you should difference. Residuals from the model may not resemble the true 
errors in the series because the estimate of P is inconsistent. Because the explanatory series seems to 
require differencing, you decide to model the SALES series in differences also and then to check for 
overdifferencing with the PLOT option. Overdifferencing also results in an MA coefficient that is an 
estimate of 1. 

The next step, then, is to fit the regression model with an MA error term. You can accomplish this in 
PROC ARIMA by replacing the ESTIMATE statement above with 

ESTIMATE INPUT=(MORT DPIC STARTS) Q=1 METHOD=ML; 

RUN; 

The results are shown in Output 4.10. 
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Output 4.10 Fitting the Regression Model with an MA Error Term: PROC ARIMA 




The 

ARIMA Procedure 






MODEL IN FIRST DIFFERENCES 





Estimation Method 




Maximum Likelihood 



Parameters Estimated 





5 



Termination Criteria 


Maximum Relative Change 

in Estimates 



Iteration Stopping Value 




0.001 



Criteria Value 





27.51793 



Maximum Absolute Value 

of Gradient 



234254.1 



R-Square Change from Last Iteration 



0.153278 



Objective Function 




Log Gaussian Likelihood 



Objective Function Value 




-289.068 



Marquardt's Lambda Coefficient 




IE-8 



Numerical Derivative Perturbation 

Delta 



0.001 



Iterations 





13 



Warning Message 



Estimates 

may not have converged. 




Maximum 

Likelihood 

Estimation 






Standard 

© 

Approx 



Parameter Estimate 

Error 

t Value 

Pr > |t| 

Lag 

Variable Shift 

MU 

91.38149 

47.30628 

1 .93 

0.0534 

0 

SALES 

0 

MAI , 

1 O 0.99973 

30.48149 

0.03 

0.9738 

1 

SALES 

0 

NUM1 

-202.26240 

60.63966 

-3.34 

0.0009 

0 

MORT 

0 

NUM2 

0.89566 

1.14910 

0.78 

0.4357 

0 

DPIC 

0 

NUM3 

5.13054 

0.28083 

18.27 

<.0001 

0 

STARTS 

0 



Constant 

Estimate 

91.38149 






Variance 

Estimate 

115435.7 






Std Error 

Estimate 

339.7582 






AIC 


588.1353 






SBC 


596.5797 






Number of 

Residuals 

40 





Correlations 

of Parameter Estimates 





Variable 

SALES 

SALES 

MORT 

DPIC 

STARTS 



Parameter 

MU 

MAI , 1 

NUM1 

NUM2 

NUM3 



SALES MU 

1 .000 

0.360 

-0.219 

-0.958 

-0.612 



SALES MAI , 1 

0.360 

1 .000 

0.159 

-0.402 

-0.314 



M0RT NUM1 

-0.219 

0.159 

1 .000 

-0.052 

0.651 



DPIC NUM2 

-0.958 

-0.402 

-0.052 

1 .000 

0.457 



STARTS NUM3 

-0.612 

-0.314 

0.651 

0.457 

1 .000 
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Output 4.10 Fitting the Regression Model with anMA Error Term: PROC ARIMA (continued) 


Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- 



6 

2.08 

5 

0.8382 

0.017 

0.079 

0.076 

0.147 

-0.066 

0.080 

12 

6.50 

11 

0.8383 

-0.128 

-0.079 

-0.144 

-0.186 

-0.037 

-0.044 

18 

11.43 

17 

0.8336 

-0.120 

-0.183 

0.024 

-0.137 

0.040 

0.074 

24 

14.65 

23 

0.9067 

-0.073 

-0.045 

0.108 

-0.053 

-0.073 

0.088 


Model for Variable SALES 

Estimated Intercept 91.38149 

Period(s) of Differencing 1 


Moving Average Factors 
Factor 1: 1 - 0.99973 B**(1) 


Input Number 1 

Input Variable MORT 
Period(s) of Differencing 1 
Overall Regression Factor -202.262 


Input Number 2 

Input Variable DPIC 
Period(s) of Differencing 1 
Overall Regression Factor 0.895659 


Input Number 3 


Input Variable STARTS 
Period(s) of Differencing 1 
Overall Regression Factor 5.130543 


You have used the generally more accurate maximum-likelihood (ML) method of estimation on the 
differenced series. Remember that the IDENTIFY statement determines the degree of differencing 
used. You should note that the MA parameter 0.99973 O is not significant (p-value >.05). The 
calculated t statistics © on the explanatory variables have changed from the values they had in the 
regression with no model for the error series. Also note that PROC AUTOREG, another SAS 
procedure for regression with time series errors, cannot be used here because it does not allow for 
differencing (a problem that can be alleviated in the DATA step but could be very cumbersome for 
handling the forecasts and standard errors) and because it works only with AR error terms. 

Something has happened here that can happen in practice and is worth noting. The moving average 
parameter estimate is almost 1. A moving average parameter of 1 is exactly what would be expected 
if the original regression model in series levels had a white noise error term. This in turn indicates 
that just an ordinary regression would suffice to fit the model without any differencing being 
required. Further inspection of the printout, however, reveals that this number may in fact not be a 
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good estimate of the true moving average parameter, this coming from the message about estimates 
not converging. Decisions made on the basis of this number can thus not be supported. 

It is worth noting that since the first edition of this book, in which the example first appeared, some 
relevant developments have taken place. If a regression model with stationary errors is appropriate 
for data in which the variables themselves appear to be nonstationary, then these errors are a 
stationary linear combination of nonstationary variables. The variables, independent and dependent, 
are then said to be cointegrated. Tests for cointegration are available in PROC VARMAX, discussed 
in Chapter 5, Section 5.2. It will be seen that elimination of some seemingly unimportant input 
variables in the example results in a model that does not show this problem, and this is the route that 
will be taken here. However, a test for cointegration could also be used to make a more informed 
decision as to whether the differencing was appropriate. 

Any model that can be fit in PROC AUTOREG can also be fit in PROC ARIMA, which makes 
PROC ARIMA more generally applicable than PROC AUTOREG. The only advantage of PROC 
AUTOREG in this setting is its automatic selection of an AR model and, starting with Version 8 of 
SAS, its ability to handle strings of missing data. 

A final modeling step is to delete insignificant explanatory variables. Do not calculate SALES 
forecasts based on forecasts of unrelated series. If you do, the forecast error variance is unnecessarily 
large because the forecast then responds to fluctuations in irrelevant variables. Is it acceptable to 
eliminate simultaneously all variables with insignificant t statistics? No, it is not acceptable. 
Eliminating a single insignificant regressor, like DPIC, can change the t statistics on all remaining 
parameters. 

In the example above DPIC drifts upward, along with SALES. A nonzero MU in the differenced 
model also corresponds to drift in the original levels. The t statistic on MU is currently insignificant 
because DPIC takes over as the explainer of drift if MU is removed. Similarly, MU takes over if 
DPIC is removed. However, if you remove both terms from the model, the fit deteriorates 
significantly. DPIC and MU have the lowest t statistics. Remove DPIC and leave MU in the model 
because it is much easier to forecast than DPIC. 

When DPIC is removed from the INPUT= list in your ESTIMATE statement, what happens then to 
the t test for MU? Omitting the insignificant DPIC results in a t statistic 3.86 (not shown) on MU. 
Also note that the other t statistics change but that the mortgage rates are still not statistically 
significant. Removing the mortgage rates from the INPUT= list results in a fairly simple model. 
Review the progression of your modeling thus far: 

□ You noticed that the inputs and the dependent variable SALES were nonstationary. 

□ You checked the residuals from a regression of differenced SALES on differenced 
DPIC, STARTS, and MORT. The residuals seemed stationary and reasonably 
invertible (in other words, the IACF died down reasonably fast). 

□ You used the PLOT option to identify an error term model that was MA(1). This 
term was problematic in that its estimate was near 1, it had a huge standard error, and 
the estimation procedure may not have converged. 

□ You used t statistics to sequentially remove insignificant terms and obtain 

VS r =VH r +\|/ + e r -pe w 

where 

V indicates a first difference 

S is sales at time t 
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H is U.S. housing starts at time t 

\|/ is a constant (drift) that corresponds to the slope in a plot of the undifferenced series 
against time. 


The final MA estimate 0.60397 © is not particularly close to 1, giving you some confidence that you 
have not overdifferenced. No convergence problems remain at this point. (See Output 4.11). 

Consider two scenarios for forecasting this series. First, suppose you are supplied with future values 
of housing starts H /+1 from some source. You incorporate these into your data set along with missing 

values for the unknown future values of SALES, and you call for a forecast. You do not supply 
information about the forecast accuracy of future housing start values, nor can the procedure use such 
information. It simply treats these futures as known values. In the second scenario, you model 
housing starts and then forecast them from within PROC ARIMA. This, then, provides an example of 
a case 2 problem. 

For the first scenario, imagine you have been given future values of U.S. housing starts (the values 
are actually those that would be forecast from PROC ARIMA, giving you an opportunity to see the 
effect of treating forecasts as perfectly known values). The first step is to create a data set with future 
values for DATE and STARTS and missing values for SALES. This data set is then concatenated to 
the original data set. The combined data set COMB has eight values of future STARTS. Use the 
following SAS statements: 

PROC ARIMA DATA=COMB; 

IDENTIFY VAR=SALES(1) CROSSCOR=(STARTS(1)) NOPRINT; 

ESTIMATE Q=1 INPUT=(STARTS) METHOD=ML; 

FORECAST LEAD=8 ID=DATE INTERVAL=QTR OUT=FORl; 

TITLE 'DATA WITH FORECASTS OF STARTS APPENDED AND SALES=.'; 

RUN; 


The results are shown in Output 4.11. 

Output 4.11 Forecasting with Future Input Values and Missing Future Sales Values 


DATA WITH FORECASTS OF STARTS APPENDED AND SALES=. 

The ARIMA Procedure 




Maximum 

Likelihood 

Estimation 






Standard 


Approx 




Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

91.99669 

25.60324 

3.59 

0.0003 

0 

SALES 

0 

MAI ,1 

0.60397 © 

0.15203 

3.97 

<.0001 

1 

SALES 

0 

NUM1 

5.45100 

0.26085 

20.90 

<.0001 

0 

STARTS 

0 



Constant 

Estimate 

91.99669 






Variance 

Estimate 

152042.2 






Std Error 

Estimate 

389.9259 






AIC 


594.1269 






SBC 


599.1936 






Number of 

Residuals 

40 
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Output 4.11 Forecasting with Future Input Values and Missing Future Sales Values (continued) 


Correlations of Parameter Estimates 





Variable 


SALES 

SALES 

STARTS 






Parameter 


MU 

MAI ,1 

NUM1 






SALES 

MU 

1 .000 

-0.170 

0.010 






SALES 

MAI ,1 

-0.170 

1 .000 

0.068 






STARTS 

NUM1 

0.010 

0.068 

1 .000 






Autoco 

rrelation 

Check of 

Residuals 




To 

Chi- 


Pr > 







Lag 

Square 

DF 

ChiSq 



—Autocorrelations-- 



6 

2.07 

5 

0.8388 

-0.100 

0.030 

0.142 

0.112 

-0.044 

0.011 

12 

6.81 

11 

0.8140 

-0.134 

-0.057 

-0.092 

-0.229 

-0.061 

-0.029 

18 

11.77 

17 

0.8140 

-0.187 

-0.130 

0.007 

-0.078 

0.059 

0.115 

24 

19.29 

23 

0.6844 

-0.025 

0.051 

0.218 

-0.010 

-0.072 

0.155 


Model for Variable SALES 

Estimated Intercept 91.99669 

Period(s) of Differencing 1 


Moving Average Factors 
Factor 1: 1 - 0.60397 B**(1) 

Forecasts for Variable SALES 


Obs 

Forecast 

Std Error 

95% Confidence Limits 

42 

19322.8041 

389.9259 

18558.5634 

20087.0447 

43 

19493.4413 

419.3912 

18671.4496 

20315.4329 

44 

19484.7198 

446.9181 

18608.7764 

20360.6631 

45 

19576.7165 

472.8452 

18649.9570 

20503.4759 

46 

19679.9629 

497.4227 

18705.0324 

20654.8935 

47 

19794.3720 

520.8417 

18773.5409 

20815.2030 

48 

19857.6642 

543.2521 

18792.9096 

20922.4189 

49 

19949.6609 

564.7740 

18842.7242 

21056.5976 


The estimation is exactly the same as in the original data set because SALES has missing values for 
all future quarters, and thus these points cannot be used in the estimation. Because future values are 
available for all inputs, forecasts are generated. A request of LEAD=10 also gives only eight 
forecasts because only eight future STARTS are supplied. Note that future values were supplied to 
and not generated by the procedure. Forecast intervals are valid if you can guarantee the future values 
supplied for housing starts. Otherwise, they are too small. Section 4.3.2 displays a plot of the 
forecasts from this procedure and also displays a similar plot in which PROC ARIMA is used to 
forecast the input variable. See Output 4.13. 

Predicted SALES are the same (recall that future values of STARTS in this example are the same as 
those produced in PROC ARIMA), but forecast intervals differ considerably. Note that the general 
increase in predicted SALES is caused by including the drift term v|/. 
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4.3.2 Case 2: Simple Transfer Functions 

In case 2, housing starts H f are used as an explanatory variable for a company's sales. Using fitting 
and diagnostic checking, you obtain the model 

VS, = y + PVH, + r), 

where q is the moving average 

th =e t ~Qe t _ 1 

In case 1, you supplied future values of H ( to PROC ARIMA and obtained forecasts and forecast 
intervals. The forecasts were valid, but the intervals were not large enough because future values of 
housing starts were forecasts. In addition, you have the problem of obtaining these future values for 
housing starts. PROC ARIMA correctly incorporates the uncertainty of future housing start values 
into the sales forecast. 

Step 1 in this methodology identifies and estimates a model for the explanatory variable H f , U.S. 
housing starts. The data are quarterly and, based on the usual criteria, the series should be 
differenced. The differenced series VH, shows some correlation at lag 4 but not enough to warrant a 
span 4 difference. Use an AR factor to handle the seasonality of this series. 

Diagnostic checking was done on the STARTS series H f . The model 
(l-«B 4 )vH, = (l-0B 3 )c, 

fits well. In Section 4.3.1, the series was forecast eight periods ahead to obtain future values. You do 
not need to request forecasts of your inputs (explanatory series) if your goal is only to forecast target 
series (SALES, in this case). The procedure automatically generates forecasts of inputs that it needs, 
but you do not see them unless you request them. 

In step 2, an input series is used in an input option to identify and estimate a model for the target 
series S ( . This part of the SAS code is the same as that in the previous example. The two steps must 
be together in a single PROC ARIMA segment. The entire set of code is shown below and some of 
the output is shown in Output 4.12. Some of the output has been suppressed (it was displayed 
earlier). Also, forecast intervals are wider than in case 1, where forecasts of H ( were taken from this 
run and concatenated to the end of the data set instead of being forecast by the procedure. This made 
it impossible to incorporate forecast errors for H ( into the forecast of S ( . The SAS code follows: 

PROC ARIMA DATA=HOUSING; 

TITLE 'FORECASTING STARTS AND SALES'; 

IDENTIFY VAR=STARTS(1) NOPRINT; 

ESTIMATE P=(4) Q=(3) METHOD=ML NOCONSTANT; 

FORECAST LEAD=8; 

IDENTIFY VAR=SALES(1) CROSSCOR=(STARTS(1)) NOPRINT; 

ESTIMATE Q=1 INPUT=(STARTS) METHOD=ML NOPRINT; 

FORECAST LEAD=8 ID=DATE INTERVAL=QTR OUT=FOR2 NOPRINT; 

RUN; 
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Output 4.12 Estimating Using Maximum Likelihood: PROC ARIMA 





FORECASTING STARTS AND 

SALES 







The ARIMA Procedure 







Maximum Likelihood Estimation 







Standard 


Approx 




Parameter 

Estimate Error 

t Value 

Pr > |t| 

Lag 



MAI ,1 


0.42332 0.15283 

2.77 

0.0056 

3 



AR1 ,1 


0.28500 0.15582 

1 .83 

0.0674 

4 





Variance Estimate 

30360.5 







Std Error Estimate 

174.2426 







AIC 

529.229 







SBC 

532.6068 







Number of Residuals 

40 







Correlations of Parameter 







Estimates 








Parameter MAI,1 

AR1 ,1 







MAI,1 1.000 

0.193 







AR1 ,1 0.193 

1 .000 







Autocorrelation Check of 

Residuals 




To 

Chi- 


Pr > 





Lag 

Square 

DF 

ChiSq . 

—Autocorrelations- - 



6 

2.55 

4 

0.6351 0.197 0.106 

0.034 

0.053 

0.004 

-0.063 

12 

9.90 

10 

0.4496 -0.021 -0.019 

-0.033 

-0.232 

-0.167 

-0.208 

18 

14.58 

16 

0.5558 -0.153 -0.117 

-0.114 

-0.116 

-0.058 

-0.060 

24 

18.62 

22 

0.6686 -0.096 0.118 

0.109 

0.068 

0.066 

-0.047 
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Output 4.12 Estimating Using Maximum Likelihood: PROC ARIMA (continued) 


Model for Variable STARTS 
Period(s) of Differencing 1 
No mean term in this model. 


Autoregressive Factors 
Factor 1: 1 - 0.285 B**(4) 

Moving Average Factors 
Factor 1: 1 - 0.42332 B**(3) 


Forecasts for variable STARTS 


Obs 

Forecast 

Std Error 

95% Confidence Limits 

42 

1373.2426 

174.2426 

1031.7333 

1714.7519 

43 

1387.6693 

246.4163 

904.7022 

1870.6364 

44 

1369.1927 

301.7971 

777.6812 

1960.7041 

45 

1369.1927 

318.0853 

745.7569 

1992.6284 

46 

1371.2568 

351.7394 

681 .8602 

2060.6534 

47 

1375.3683 

382.4434 

625.7930 

2124.9437 

48 

1370.1026 

410.8593 

564.8332 

2175.3719 

49 

1370.1026 

430.6707 

526.0035 

2214.2016 


You can now merge the data sets FORI and FOR2 from the previous two examples and plot the 
forecasts and intervals on the same graph. This is illustrated in Output 4.13 to indicate the difference 
in interval widths for these data. 

The first graph gives forecast intervals that arose from using PROC ARIMA to forecast housing 
starts. The second plot gives these forecast intervals as a solid line along with intervals from the 
previous analysis (broken line), where the same future values for housing starts are read into the data 
set rather than being forecast by PROC ARIMA. Note how the broken line drastically underestimates 
the uncertainty in the forecasts. The narrower interval is questionable in light of the downturn in 
SALES at the end of the series. 
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4.3.3 Case 3: General Transfer Functions 

4.3.3.1 Model Identification 

You have specified the ARMA model with backshift operators. For example, you can write the 
ARMA( 1,1) model 

Y t - aY f _, =e t -0e M 

as 

(l-aB)Y, = (l-0B)e, 

or as 

Y,=(l-0B)/(l-aB)ei 
or, finally, as 

Y t = e t +(a -Q)e t _ x +a(a ~Q)e t _ 2 + a 2 (a -0)e r _ 3 +... 

The pattern of the weights (coefficients on the e s) determines that the process has one AR and one 
MA parameter in the same way the ACF does. For example, if 

Y t =e t + \.2e t _ x + ,6e t _ 2 +3e t _, + ,15e r _ 4 + .075e,_ 5 + ... 

the weights are 1, 1.2, .6, .3, .15, .075, .... The pattern is characterized by one arbitrary change 
(from 1 to 1.2) followed by exponential decay at the rate .5 (.6= (.5)(1.2), .3=(.5)(.6), . . . ). The 
exponential decay tells you to put a factor (1-.5B) in the denominator of the expression multiplying 
e (in other words, a = .5). 

Because 1.2 = a - 9, you see that 0 = -.7. The model, then, is 
Y r =(l + .7B)/(l-.5BK 

What have you learned from this exercise? First, you see that you can write any ARMA model by 
setting Y equal to a ratio of polynomial factors in the backshift operator B operating on e f Next, you 
see that if you can estimate the sequence of weights on the es, you can determine how many AR and 
MA lags you need. Finally, in this representation, you see that the numerator polynomial corresponds 
to MA factors and the denominator corresponds to AR factors. 

If you can apply a ratio of backshift polynomials to an unobserved error series e, why not apply one 
to an observable input? This is exactly what you do in case 3. For example, suppose you write 

Y, - -8Y, , =3(X M -.4X,_,) + ii, 

where q ( is the moving average 

r\ t = e t + ,6e t _ x 

You then obtain 

(l-.8B)Y r =3(l-.4B)X M +(l + .6B>, 
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or 

Y t = 0 + 3(l - .4B) / (l - .8B)X M + (l + .6B) / (l - .8B)c r 

This is called a transfer function. Y is modeled as a function of lagged values of the input series X 
and current and lagged values of the shocks e r Usually, the intercept is not 0, although for simplicity 
0 is used in the preceding example. 

You now have a potentially useful model, but how is it used? With real data, how will you know the 
form of the backshift expression that multiplies X (1 ? The answer is in the cross-correlations. Define 
the cross-covariance as 

YyxC/) = C0V ( Y ,- Y+.,) 

and 

7v. (./) = cov(x„ Y r+; ) = cov(x t _j, Y,) = y n -(-j) 

Estimate y XY (j) by 

C XY (y) = I (X, - X)(Y +j -?)/» 

Define the cross-correlation as 

Pxy 0) = Yxy 0 )! (y XX (°)y 

YY (o)) 5 

Estimate this by 

r xv 0) = C XY 0) / (Cxx (ok VT (°))* 

To illustrate the theoretical cross-covariances for a transfer function, assume that X is a white noise 
process independent of the error series p,. The cross-covariances are computed below and are direct 
multiples of Yxx (o) , the variance of X (this holds only when X is white noise): 

Yf - -8 Y m = 3X m - l.lX t _ 2 + rp 

and 


Y, =3X,_ 1 +1.2X,_ 2 +.96X,_3 +.768X^4- . . , + noise 
Multiplying both sides by X t J ,j-0, 1, 2, 3, and computing expected values gives 
Yxy ( 0 ) = E(X r Y r ) = 0 
Yxy (l) = E(X r Y r+1 ) = E(X M Y,) = 3 Yxx (0) 

Yxy (2) = E(X r Y r+2 ) = E(X r _ 2 Y r ) = 1.2 Yxx (o) 

Yxy(3) = e(x,Y, +3 ) = .96 Yxx (0) 


and 
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When you divide each term in the cross-covariance sequence by y xx (0), you obtain the weights |3. : 


LAG J 

-1 

0 

1 

2 

3 

4 

5 

WEIGHT p ; 

0 

0 

3 

1.2 

.96 

(1.2)(.8)- 

(1.2)(.8) 3 


Note that the model involves 

3(1 - ,4B) / (l - ,8B)X M = OX, + 3X,_, + 1.2X,_, + ,96X,_ 3 + ... 
so if X is white noise, the cross-covariances are proportional to the transfer function weights |3.. 

These weights p . are also known as the impulse-response function. The reason for this name is clear 

if you ignore the error term in the model and let X be a pulse; that is, X=0 except at t —10 where 
X 10 =l. Ignoring the white noise term, you have 

Y,=3X,_ 1+ 1.2X,_ 2 +.96X,_3+... 

so 

Y 0 =Y, = Y 2 = • • • = Y 10 =0 

Y n = 3X 10 + 1.2X 9 + ,96X S + ... = 3 

Y 12 = 3(0) + 1.2(l) + ,96(0) + ... = 1.2 

and 

Y 13 =-96 

The weights are the expected responses to a pulse input. The pulse is delayed by one period. Its effect 
continues to be felt stalling with 1=11 but diminishes quickly because of the stationary denominator 
(in other words, AR-type operator (1-.8B) " on X t] ). 

The crucial point is that if you can obtain the cross-correlations, you have the impulse-response 
weight pattern, which you can then analyze by the same rules used for the ACFs. In the example 
above, the 0 on X indicates a pure delay. The arbitrary jump from 3 to 1.2, followed by exponential 
decay at rate .8, indicates that the multiplier on X (1 has one numerator (MA) lag and one 
denominator (AR) lag. The only problem is the requirement that X be white noise, which is 
addressed below. 

Suppose you have the same transfer function, but X is AR(1) with parameter a. You have 
Y, = 0 + 3(l - .4B) / (l - .8B)X M + (l + .6B) / (l - .8B)e, 

and 


X, — OcX,_j + 6, 

where the X s are independent of the e s and where e and e, are two (independent) white noise 
sequences. 
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Note that 

Y 2 = 3X 22 -t-1.2X 2 _ 0 + .96X 2 3 -t-. . . -t- noise r 
so 

aY ( _, = 3aX f _T + 1.2aX f _ 3 +.96aX f _ 4 +. . . + a (noise r _j) 

and 

Y,-aY M =3(X M -aX,_ 2 ) + 1.2(X,_ 2 -aX,_ 3 ) + .96(X,_ 3 -aX,_ 4 ) + . . . + N,' 
where N' r is a noise term. Set 
Y t - aY M =Y t ' 
and note that 

X r — aX M = e t 

is a white noise sequence, so the expression above becomes 
Y/ = 3e M + 1.2e r _ 2 + .96e,_ 3 + ... + N' 

The impulse-response function is exactly what you want, and 
X r — aX M = e t 
is a white noise sequence. 

You want to model X and use that model to estimate Y' and e,. This process is known as 
prewhitening, although it really only whitens X. Next, compute the cross-correlations of the 
prewhitened X and Y (in other words, the estimated Y' and e,). Note that the prewhitened variables 
are used only to compute the cross-correlations. The parameter estimation in PROC ARIMA is 
always performed on the original variables. 


4.3.3.2 Statements for Transfer Function Modeling in the 
IDENTIFY Stage 

Use the IDENTIFY and ESTIMATE statements in PROC ARIMA to model X. A subsequent 
IDENTIFY statement for Y with the CROSSCOR=(X) option automatically prewhitens X and Y, 
using the previously estimated model for X. For this example, you specify the following SAS 
statements: 

PROC ARIMA DATA=TRANSFER; 

TITLE 'FITTING A TRANSFER FUNCTION'; 

IDENTIFY VAR=X; 

ESTIMATE P=1; 

IDENTIFY VAR=Y CROSSCOR=(X) NLAG=10; 

RUN; 

The results are shown in Output 4.14. 
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Output 4.14 Fitting a Transfer Function with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA 





FITTING A TRANSFER FUNCTION 





The ARIMA Procedure 





Name of Variable = X 




Mean 

of Working Series 5.000159 




Standard 

Deviation 1.168982 




Number of 

Observations 500 






Autocorrelations 


Lag 

Covariance 

Correlation 

-1 

98765432101234567891 

Std Error 

0 

1.366519 

1.00000 

1 


0 

1 

0.694896 

0.50851 

1 

1********** 1 

0.044721 

2 

0.362484 

0.26526 

1 

1 * * * * * 1 

0.055085 

3 

0.225201 

0.16480 

1 

1 * * * 1 

0.057583 

4 

0.187569 

0.13726 

1 

1 * * * | 

0.058519 

5 

0.115383 

0.08444 

1 

1 * * I 

0.059159 

6 

0.117494 

0.08598 

1 

1 * * 1 

0.059400 

7 

0.094599 

0.06923 

1 

1 * 1 

0.059648 

8 

-0.010062 

-.00736 

1 

■ 1 ■ 1 

0.059808 

9 

-0.038517 

-.02819 

1 

* 1 1 

0.059810 

10 

-0.029129 

-.02132 

1 

■ 1 ■ 1 

0.059837 

11 

-0.078230 

-.05725 

1 

* 1 i 

0.059852 

12 

-0.153233 

- .11213 

1 

* * 1 1 

0.059961 

13 

-0.107038 

-.07833 

1 

* * 1 1 

0.060379 

14 

-0.101788 

-.07449 

1 

.* | . | 

0.060582 

15 

-0.090632 

-.06632 

1 

.* | . | 

0.060765 




Inverse Autocorrelations 



Lag 

Correlation 

-1 

98765432101234567891 



1 

-0.39174 

1 

********1 1 

1 ■ 1 



2 

-0.00306 

1 

■ 1 ■ 1 



3 

0.03538 

1 

1 * 1 



4 

-0.05678 

1 

* 1 1 

" 1 ’ 1 



5 

0.02589 

1 

1 * 1 



6 

-0.01120 

1 

■ 1 ■ 1 



7 

-0.06020 

1 

* 1 I 

' 1 ‘ 1 



8 

0.03468 

1 

1 * 1 



9 

0.01185 

1 

■ 1 ■ 1 



10 

-0.02814 

1 

* 1 1 



11 

-0.02175 

1 

■ 1 ■ 1 



12 

0.08543 

1 

1 * * 1 
■ 1 1 



13 

-0.03405 

1 

* 1 1 



14 

0.00512 

1 

■ 1 ■ 1 



15 

0.03973 

1 

1 * 1 
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Output 4.14 Fitting a Transfer Function with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 





Partial Autocorrelations 





Lag 

Correlation 

1 9 8 7 6 5 4 3 

21012345678 

9 1 




1 


0.50851 



********** 





2 


0.00900 








3 


0.03581 



* 





4 


0.05194 



* 





5 


-0.01624 








6 


0.04802 



* 





7 


0.00269 








8 


-0.07792 


* * 






9 


-0.00485 








10 


-0.00078 








11 


-0.05930 


* 






12 


-0.07466 


* 






13 


0.02408 








14 


-0.02906 


* 






15 


-0.00077 










Autocorrelation Check for 

White Noise 




To 

Chi- 


Pr > 







Lag 

Square 

DF 

ChiSq 


—Autocorrelations- - 




6 

196.16 

6 

<.0001 

0.509 0.265 

0.165 0.137 

0 

084 

0.086 

12 

207.41 

12 

<.0001 

0.069 -0.007 

-0.028 -0.021 

-0 

057 

-0.112 

18 

217.70 

18 

<.0001 

0.078 -0.074 

-0.066 -0.011 

0 

051 

0.033 

24 

241.42 

24 

<.0001 

0.041 0.071 

0.104 0.115 

0 

100 

0.066 




Conditional Least Squares 

Estimation 








Standard 


Approx 





Parameter 

Estimate 

Error 

t Value Pr > 111 

Lag 



MU 


5.00569 

0.09149 

54 

71 <.0001 


0 



AR1 ,1 


0.50854 

0.03858 

13 

18 <.0001 


1 





Constant Estimate 

2.460094 







Variance Estimate 

1 .017213 







Std Error Estimate 

1 .( 

10857 







AIC 


1 42S 

1.468 







SBC 


1 43" 

? .897 







Number 

of Residuals 


500 







* AIC and SBC 

do not include 

log 

Jeterminant. 















Chapter 4: The ARIMA Model: Introductory Applications 189 


Output 4.14 Fitting a Transfer Function with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 






Correlations of Parameter 








Estimates 









Parameter MU 

AR1 ,1 







MU 1.000 

0.005 







AR1,1 0.005 

1 .000 







Autocorrelation Check of 

Residuals 




To 

Chi- 


Pr > 






Lag 

Square 

DF 

ChiSq 


- -Autocorrelations-- 



6 

2.92 

5 

0.7120 

-0.005 -0.012 

0.004 

0.062 

-0.010 

0.041 

12 

11.37 

11 

0.4127 

0.063 -0.041 

-0.028 

0.022 

-0.006 

-0.097 

18 

15.99 

17 

0.5246 

-0.005 -0.027 

-0.054 

-0.008 

0.072 

-0.007 

24 

20.23 

23 

0.6278 

-0.001 0.020 

0.050 

0.055 

0.046 

0.008 

30 

24.29 

29 

0.7144 

-0.013 0.050 

0.042 

-0.011 

-0.005 

0.055 

36 

26.44 

35 

0.8506 

0.016 -0.030 

-0.014 

-0.023 

0.045 

0.011 

42 

29.67 

41 

0.9055 

-0.033 0.043 

0.012 

0.017 

-0.009 

0.049 

48 

34.37 

47 

0.9148 

0.025 -0.021 

0.018 

0.024 

0.038 

0.071 





Model for Variable 

X 








Estimated Mean 5.005691 








Autoregressive Factors 








Factor 1: 1 - 0.50854 

B** (1 ) 








Name of Variable = 

Y 







Mean of Working Series 

10.05915 







Standard Deviation 

6.141561 







Number of Observations 

500 
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Output 4.14 Fitting a Transfer Function with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 






Autocorrelations 



Lag 

Covariance 

Correlation 

-1 

9 8 

76543210123456789 

1 

Std Error 

0 

37.718773 

1.00000 

1 


|********************| 

0 

1 

32.298318 

0.85629 

1 


1 ***************** 

1 

0.044721 

2 

27.167391 

0.72026 

1 


1 ************** 

1 

0.070235 

3 

22.911916 

0.60744 

1 


1 ************ 

1 

0.083714 

4 

19.393508 

0.51416 

1 


1 ********** 

’ 1 

1 

0.092109 

5 

15.904433 

0.42166 

1 


1 ******** 

‘ 1 

1 

0.097680 

6 

13.223877 

0.35059 

1 


1 ******* 

■ 1 

1 

0.101255 

7 

10.558154 

0.27992 

1 


1 ****** 

■ 1 

1 

0.103655 

8 

7.414747 

0.19658 

1 


1 * * * * 

* 1 

1 

0.105156 

9 

5.180506 

0.13735 

1 


1 * * * 

1 

0.105888 

10 

3.731949 

0.09894 

1 


1 * * 

1 

0.106244 




Inverse 

Autocorrelations 




Lag 

Correlation 

-1 

9 8 

76543210123456789 

1 



1 

-0.52149 

1 


********** 1 

1 

1 



2 

0.01944 

1 



1 



3 

0.04247 

1 


■ 1 * ■ 

1 



4 

-0.07597 

1 


* * 1 

1 

1 



5 

0.06999 

1 


■ 1 * ■ 

1 



6 

-0.00905 

1 



1 



7 

-0.08143 

1 


* * 1 

1 

1 



8 

0.07222 

1 


■ 1 * ■ 

1 



9 

0.00167 

1 



1 



10 

-0.01040 

1 



1 





Partial 

Autocorrelations 




Lag 

Correlation 

-1 

9 8 

76543210123456789 

1 



1 

0.85629 

1 


1 ***************** 

1 



2 

-0.04864 

1 


,*| . 

1 



3 

0.00878 

1 



1 



4 

0.00619 

1 



1 



5 

-0.05155 

1 


,*| . 

1 



6 

0.02489 

1 



1 



7 

-0.04671 

1 


,*| . 

1 



8 

-0.09421 

1 


* * 1 

1 

1 



9 

0.03137 

1 


■ 1 * ■ 

1 



10 

0.01915 

1 



1 
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Output 4.14 Fitting a Transfer Function with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 





Autocorrelation 

Check for White Noise 

To 

Chi- 


Pr > 


Lag 

Square 

DF 

ChiSq . 

.Autocorrelations. 

6 

1103.03 

6 

<.0001 0.856 

0.720 0.607 0.514 0.422 

0.351 




Correlation of Y and X 



Number of Observations 500 

Variance of transformed series Y 14.62306 

Variance of transformed series X 1.013152 




Both series have been prewhitened. 




Crosscorrelations 


Lag Covariance 

Correlation 

1 987654321 01 234567891 


-10 

-0.297935 

- .07740 

** | . 



-9 

-0.147123 

- .03822 

■*l ■ 



-8 

-0.066199 

- .01720 

■ 1 ■ 



-7 

-0.145114 

- .03770 

■*l ■ 



-6 

-0.232308 

- .06035 

■*l ■ 



-5 

0.135384 

0.03517 

■ I*- 



-4 

0.147029 

0.03820 

■ I*- 



-3 

0.023585 

0.00613 

■ 1 ■ 



-2 

0.249856 

0.06491 

■ I*- 



-1 

0.090743 

0.02358 

■ 1 ■ 



0 

0.089734 

0.02331 

■ 1 ■ 



1 

0.033887 

0.00880 

■ 1 ■ 



2 

3.053341 

0.79327 

ft | **************** 



3 

1 .275799 

0.33146 

0 ^ | ******* 



4 

0.920176 

0.23906 

1 ***** 



5 

0.777420 

0.20198 

1 * * * * 



6 

0.806808 

0.20961 

1 * * * * 



7 

0.505834 

0.13142 

1 * * * 



8 

0.577540 

0.15005 

1 * * * 



9 

0.562041 

0.14602 

1 * * * 



10 

0.254582 

0.06614 

■ i*- 













192 SAS for Forecasting Time Series 


Output 4.14 Fitting a Transfer Function with the IDENTIFY and ESTIMATE Statements: 

PROC ARIMA (continued) 

Crosscorrelation Check Between Series 
To Chi- Pr > 

Lag Square DF ChiSq .Crosscorrelations. 

5 418.85 6 <.0001 0.023 0.009 0.793 0.331 0.239 0.202 

Both variables have been prewhitened by the following filter: 

Prewhitening Filter 

Autoregressive Factors 
Factor 1: 1 - 0.50854 B**(1) 

Data for this example are generated from the model 

(Y, - 10) - .8(Y M - 10) = 3((X,_ 2 - 5) - .4(X,_3 - 5)) + N, 

where 

(X,-5) = .5(X M -5) + e, 

The cross-correlations are near 0 until you reach lag 2. You now see a spike (0.79327) O followed 
by an arbitrary drop to 0.33146 © followed by a roughly exponential decay. The one arbitrary drop 
corresponds to one numerator (MA) lag (l - OB) and the exponential decay to one denominator (AR) 

lag (l - aB). The form of the transfer function is then 

C(l - 0B) / (l - aB)X r _ 2 = (C - (C0 )b) / (l - aB)X r _ 2 

Note the pure delay of two periods. The default in PROC ARIMA is to estimate the model with the C 
multiplied through the numerator as shown on the right. The ALTPARM option gives the factored C 
form as on the left. 

Now review the PROC ARIMA instructions needed to run this example. In INPUT=(/brm7 variablel 
form2 variable2. . .), the specification for the transfer function form is 

S$(Li 2 . Lj , 2 . • • • ) • • • (Lj-j ...) / (L t+1 j 

where 

S is the shift or pure delay (2 in the example) 

lag polynomials are written in multiplicative form 

variable j is not followed by differencing numbers (this is done in 
CROSSCOR). 

For example, 

INPUT=(2$(1,3)(1)/(1)X) ALTPARM; 
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indicates 

Y t = 0 O + (c(l - 0^ - 0 2 B 3 )(l - otB) / (l - 8B))x r _ 2 + noise 

Several numerator and denominator factors can be multiplied together. Note the absence of a transfer 
function form in the sales and housing starts example, which assumes that only contemporaneous 
relationships exist among sales, S , and the input variables. 

For the current (generated data) example, the transfer function form should indicate a pure delay of 
two (2$), one numerator (MA) lag (2$( 1)), and one denominator lag (2$(1)/(1)). Use the PLOT 
option to analyze the residuals and then estimate the transfer function with the noise model. 

To continue with the generated data, add these SAS statements to those used earlier to identify and 
estimate the X model and to identify the Y model: 

ESTIMATE INPUT=(2$(1)/(1)X) MAXIT=30 
ALTPARM PLOT METHOD=ML; 

RUN; 

The code above produces Output 4.15. Note the AR(1) nature of the autocorrelation plot of 
residuals. Continue with the following code to produce Output 4.16: 

ESTIMATE P=1 INPUT=(2$(1)/(1)X) 

PRINTALL ALTPARM METHOD=ML; 

FORECAST LEAD=10 OUT=OUTDATA ID=T; 

RUN; 

DATA NEXT; 

SET OUTDATA; 

IF T > 4 8 0 ; 

RUN; 

PROC PRINT DATA=NEXT; 

TITLE 'FORECAST OUTPUT DATA SET'; 

RUN; 

PROC GPLOT DATA=NEXT; 

PLOT L95*T U95*T FORECAST*T Y*T / OVERLAY HMINOR=0; 

SYMBOL1 V=L I=NONE C=BLACK; 

SYMBOL2 V=U I=NONE C=BLACK; 

SYMBOL3 V=F L=2 I=JOIN C=BLACK; 

SYMBOL4 V=A L=1 I=JOIN C=BLACK; 

TITLE 'FORECASTS FOR GENERATED DATA'; 

RUN; 


4.3.3.3 Model Evaluation 

The estimated model is 

Y t = -32.46 + 2.99(l-.78B) _1 (l-.37B) X,_ 2 +(l-.79B) _1 ri r 

as shown in Output 4.16 O Standard errors are (1.73), (.05), (.01), (.02), and (.03). 

In the autocorrelation and cross-correlation checks of residuals and input, note the following facts: 

□ Chi-square statistics automatically printed by PROC ARIMA are like the 0 statistics 
discussed earlier for standard PROC ARIMA models. 

□ Cross-correlation of residuals with input implies improper identification of the 
transfer function model. This is often accompanied by autocorrelation in residuals. 
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□ Autocorrelation of residuals not accompanied by cross-correlation of residuals with 
X indicates that the transfer function is right but that the noise model is not properly 
identified. 

See Output 4.15 ©, from 

ESTIMATE INPUT=(2$(1)/(1)X) . . . ; 

versus Output 4.16 ©, from 

ESTIMATE P=1 . . . ; 

Neither cross-correlation check © © indicates any problem with the transfer specification. First, the 
inputs are forecast and then used to forecast Y. In an example without prewhitening, future values of 
X must be in the original data set. 


Output 4.15 Fitting a Transfer Function: PROC ARIMA 


FITTING A TRANSFER FUNCTION 

The ARIMA Procedure 


Maximum Likelihood Estimation 




Standard 


Approx 




Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

-33.99761 

0.77553 

-43.84 

<.0001 

0 

Y 

0 

SCALE1 

3.06271 

0.07035 

43.53 

<.0001 

0 

X 

2 

NUM1 ,1 

0.39865 

0.02465 

16.17 

<.0001 

1 

X 

2 

DENI ,1 

0.79069 

0.0079330 

99.67 

<.0001 

1 

X 

2 


Constant Estimate 

-33.9976 

Variance Estimate 

2.702153 

Std Error Estimate 

1.643823 

AIC 

1908.451 

SBC 

1925.285 

Number of Residuals 

497 





Chapter 4: The ARIMA Model: Introductory Applications 195 


Output 4.15 Fitting a Transfer Function: PROC ARIMA (continued) 






Correlations of 

Parameter Estimates 






Variable 

Y 

X 

X 

X 





Parameter 

MU 

SCALE1 

NUM1 ,1 

DENI ,1 





Y 


MU 

1 .000 

-0.328 

-0.347 

-0.634 





X 


SCALE1 

-0.328 

1 .000 

0.689 

0.291 





X 


NUM1 ,1 

-0.347 

0.689 

1 .000 

0.821 





X 


DENI ,1 

-0.634 

0.291 

0.821 

1 .000 







Autocorrelation 

Check of Residuals 




To 

Chi- 



Pr > 







Lag 

Square 


DF 

ChiSq 



Autocorrelations- - - - 



6 

731.66 


6 

<.0001 

0.780 

0.607 

0.489 

0.365 

0.265 

0.201 

12 

751.66 


12 

<.0001 

0.165 

0.084 

0.043 

0.048 

0.031 

0.002 

© 18 

771.14 


18 

<.0001 

-0.027 

-0.047 

-0.096 

-0.096 

-0.095 

-0.086 

24 

784.80 


24 

<.0001 

-0.087 

-0.071 

-0.062 

-0.075 

-0.059 

-0.025 

30 

802.67 


30 

<.0001 

-0.021 

-0.019 

-0.048 

-0.083 

-0.111 

-0.107 

36 

862.27 


36 

<.0001 

-0.116 

-0.141 

-0.147 

-0.143 

-0.142 

-0.125 

42 

892.19 


42 

<.0001 

-0.108 

-0.093 

-0.106 

-0.088 

-0.090 

-0.088 

48 

913.92 


48 

<.0001 

-0.041 

-0.011 

0.018 

0.053 

0.116 

0.145 





Autocorrelation 

Plot of Residuals 




Lag 

Covariance 


Correlation 

-19 8 7 

6 5 4 3 2 1 

0 12 3 4 

5 6 7 8 9 

1 

Std Error 

0 

2.702153 



1 .00000 

1 


|******************** j 

0 

1 

2.107649 



0.77999 

1 


|**************** 

1 

0.044856 

2 

1.640982 



0.60729 

1 


|************ 

1 

0.066785 

3 

1.321663 



0.48911 

1 


|********** 

1 

0.077100 

4 

0.985109 



0.36456 

1 


1 ******* 


1 

0.083109 

5 

0.715251 



0.26470 

1 


1 ***** 


1 

0.086267 

6 

0.543254 



0.20104 

1 


1 * * * * 


1 

0.087886 

7 

0.447149 



0.16548 

1 


1 * * * 


1 

0.088806 

8 

0.226080 



0.08367 

1 


1 * * 


1 

0.089424 

9 

0.115412 



0.04271 

1 


1 * ■ 


1 

0.089582 

10 

0.129203 



0.04781 

1 


1 * ■ 


1 

0.089623 
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Output 4.15 Fitting a Transfer Function: PROC ARIMA (continued) 





Inverse 

Autocorrelations 





Lag 

Correlation 


1 9 8 

7654321012 

3 4 5 6 7 8 9 1 




1 

-0.49941 



********** 






2 

0.05096 




* 





3 

-0.08471 



* * 






4 

0.05085 




* 





5 

-0.00835 









6 

0.06962 




* 





7 

-0.14938 



* * * 






8 

0.08073 




* * 





9 

0.04528 




* 





10 

-0.04108 



* 








Partial 

Autocorrelations 





Lag 

Correlation 


1 9 8 

7654321012 

3 4 5 6 7 8 9 1 




1 

0.77999 




**************** 




2 

-0.00280 









3 

0.04161 




* 





4 

-0.07432 



* 






5 

-0.01456 









6 

0.02003 









7 

0.03808 




* 





8 

-0.13172 



* * * 






9 

0.03341 




* 





10 

0.06932 




* 






Crosscorrelation Check of Residuals 

with 

Input X 



To 

Chi- 

Pr > 








Lag 

Square 

DF ChiSq 



.Crosscorrelations. 



5 

0.57 

4 0.9668 


0.031 

0.008 -0.004 

-0.007 -0 

006 

0.001 

11 

1 .93 

10 0.9969 


0.007 

-0.009 0.004 

-0.002 -0 

018 

0.048 

© 17 

7.25 

16 0.9682 


0.056 

0.042 

3.037 

0.043 0 

044 

0.027 

23 

15.71 

22 0.8300 


0.032 

0.072 0.055 

0.058 0 

057 

0.035 

29 

16.35 

28 0.9603 


0.019 

-0.003 0.016 

0.012 -0 

019 

0.013 

35 

19.87 

34 0.9743 


0.075 

-0.022 -0.002 

0.029 -0 

013 

0.003 

41 

23.89 

40 0.9796 


0.002 

0.002 -0.019 

0.026 0 

070 

0.047 

47 

26.13 

46 0.9919 


0.021 

0.023 -0.001 

-0.019 0 

032 

0.047 
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Output 4.15 Fitting a Transfer Function: PROC ARIMA (continued) 

Model for Variable Y 
Estimated Intercept -33.9976 

Input Number 1 

Input Variable X 

Shift 2 

Overall Regression Factor 3.062708 

Numerator Factors 
Factor 1: 1 - 0.39865 B**(1) 

Denominator Factors 
Factor 1: 1 - 0.79069 B**(1) 


Output 4.16 Modeling and Plotting Forecasts for Generated Data 

FITTING A TRANSFER FUNCTION 

The ARIMA Procedure 

Preliminary Estimation 

Initial Autoregressive 
Estimates 

Estimate 

1 0.85629 


Constant Term Estimate 
White Noise Variance Est 


1.445571 
10.06195 
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Output 4.16 Modeling and Plotting Forecasts for Generated Data (continued) 


Conditional Least Squares Estimation 


SSE MU 

AR1 ,1 

SCALE1 

NUM1 ,1 

DENI ,1 

Constant 

Lambda 

R Crit 

3908.38 10.05915 

0.85629 

3.01371 

0.10000 

0.10000 

1.445571 

0.00001 

1 

2311.96 -4.07135 

0.90707 

2.78324 

0.45056 

0.75974 

-0.37835 

0.00001 

0.909428 

552.12 -39.3685 

0.87876 

2.98650 

0.39526 

0.81121 

-4.7732 

IE-6 

0.881845 

532.23 -32.7390 

0.82664 

3.00105 

0.39186 

0.78952 

-5.67548 

0.001 

0.257986 

516.31 -31.1812 

0.77427 

2.99250 

0.36716 

0.76939 

-7.03863 

0.0001 

0.186947 


Maximum Likelihood Estimation 


Loglike 

MU 

AR1 ,1 

SCALE1 

NUM1 ,1 

DENI ,1 

Constant 

Lambda 

R Crit 

-711.36100 

-31 .1812 

0.77427 

2.99250 

0.36716 

0.76939 

-7.03863 

0.00001 

1 

-710.55902 

-32.4260 

0.79415 

2.99417 

0.37233 

0.77860 

-6.67482 

IE-6 

0.058893 

-710.53822 

-32.4551 

0.79253 

2.99332 

0.37251 

0.77865 

-6.73363 

IE-7 

0.009205 

-710.53820 

-32.4632 

0.79263 

2.99340 

0.37258 

0.77872 

-6.73204 

IE-8 

0.000325 


ARIMA Estimation Optimization Summary 


Estimation Method 

Parameters Estimated 

Termination Criteria 

Iteration Stopping Value 

Criteria Value 

Alternate Criteria 

Alternate Criteria Value 

Maximum Absolute Value of Gradient 

R-Square Change from Last Iteration 

Objective Function 

Objective Function Value 

Marquardt's Lambda Coefficient 


Maximum Likelihood 
5 

Maximum Relative Change in Estimates 

0.001 

0.000248 

Relative Change in Objective Function 

3.152E-6 
3.435957 
0.000325 
Log Gaussian Likelihood 
-710.538 
IE-8 


ARIMA Estimation Optimization Summary 


Numerical Derivative Perturbation Delta 
Iterations 


0.001 

3 
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Output 4.16 Modeling and Plotting Forecasts for Generated Data (continued) 






Maximum Likelihood Estimation 








Standard 


Approx 




Parameter 

Estimate 

Error t 

Value 

Pr > |t| 

Lag 

Variable 

Shift 

O MU 


-32.46316 

1 .72566 

-18.81 

<.0001 

0 

Y 

0 

AR1 ,1 


0.79263 

0.02739 

28.94 

<.0001 

1 

Y 

0 

SCALE1 


2.99340 

0.04554 

65.74 

<.0001 

0 

X 

2 

NUM1 ,1 


0.37258 

0.02286 

16.30 

<.0001 

1 

X 

2 

DENI ,1 


0.77872 

0.01369 

56.86 

<.0001 

1 

X 

2 





Constant Estimate 

-6.73204 








Variance Estimate 

1 .029993 








Std Error Estimate 

1 .014886 








AIC 


1431.076 








SBC 


1452.119 








Number of Residuals 

497 







Correlations of 

Parameter Estimates 





Variable 


Y 

Y 

X 

X 

X 



Parameter 


MU 

AR1 ,1 

SCALE1 

NUM1 ,1 

DENI ,1 



Y 


MU 

1 .000 

0.028 

-0.288 

-0.384 

-0.795 



Y 


AR1 ,1 

0.028 

1 .000 

-0.021 

0.002 

-0.012 



X 


SCALE1 

-0.288 

-0.021 

1 .000 

0.070 

-0.015 



X 


NUM1 ,1 

-0.384 

0.002 

0.070 

1 .000 

0.818 



X 


DENI ,1 

-0.795 

-0.012 

-0.015 

0.818 

1 .000 





Autocorrelation 

Check of 

Residuals 




To 

Chi- 


Pr 

> 






Lag 

Square 

DF 

ChiSq . 


--Autocorrelations- 



6 

4.22 

5 

0.5184 -0.011 

-0.040 

0.063 

-0.012 

-0.037 

-0.034 

CM 

19.06 

11 

0.0599 0.118 

-0.071 

-0.084 

0.044 

0.037 

-0.003 

18 

26.15 

17 

0.0718 -0.017 

0.052 

-0.098 

-0.005 

-0.029 

0.017 

24 

34.15 

23 

0.0630 -0.043 

0.006 

0.046 

-0.073 

-0.040 

0.067 

30 

39.57 

29 

0.0911 -0.006 

0.072 

0.010 

-0.025 

-0.066 

0.002 

36 

42.81 

35 

0.1710 0.019 

-0.040 

-0.038 

-0.003 

-0.049 

-0.018 

42 

52.32 

41 

0.1107 -0.007 

0.036 

-0.067 

0.030 

-0.010 

-0.104 

48 

58.30 

47 

0.1248 0.028 

-0.003 

-0.012 

-0.045 

0.085 

0.026 







200 SAS for Forecasting Time Series 


Output 4.16 Modeling and Plotting Forecasts for Generated Data (continued) 




Crosscorrelation Check 

of Residuals with 

Input X 



To 

Chi- 


Pr > 







Lag 

Square 

DF 

ChiSq 


.Crosscorrelations- 



© 5 

0.75 

4 

0.9447 

-0.002 

0.019 -0.020 


0.005 

0.018 

0.020 

11 

6.96 

10 

0.7294 

0.026 

-0.010 0.023 


-0.002 

-0.015 

0.105 

17 

7.70 

16 

0.9573 

0.026 

0.003 0.007 


0.020 

0.018 

-0.008 

23 

11.08 

22 

0.9736 

0.013 

0.074 -0.003 


0.024 

0.019 

-0.015 

29 

14.34 

28 

0.9846 

-0.015 

-0.029 0.035 


0.001 

-0.046 

0.047 

35 

28.14 

34 

0.7498 

-0.131 

0.060 0.024 


0.051 

-0.056 

0.027 

41 

33.95 

40 

0.7384 

-0.001 

-0.002 -0.028 


0.067 

0.079 

-0.015 

47 

38.77 

46 

0.7664 

-0.021 

0.011 -0.033 


-0.026 

0.077 

0.038 





Model for 

Variable Y 








Estimated Intercept -32.4632 









Autoregressive Factors 








Factor 1: 1 

- 0.79263 B**(1) 









Input 

Number 1 








Input 

Variable 


X 







Shift 



2 







Overall Regression Factor 2.993395 








Numerator Factors 








Factor 1: 1 

- 0.37258 B**(1) 









Denominator Factors 








Factor 1: 1 

- 0.77872 B**(1) 
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Output 4.16 Modeling and Plotting Forecasts for Generated Data (continued) 




Forecasts for Variable Y 




Obs 

Forecast Std Error 

95% Confidence Limits 



501 

12.7292 

1.0149 

10.7400 

14.7183 



502 

10.9660 

1.2950 

8.4278 

13.5042 



503 

10.7301 

3.3464 

4.1713 

17.2889 



504 

10.5608 

4.3680 

1 .9997 

19.1219 



505 

10.4360 

4.9805 

0.6744 

20.1976 



506 

10.3423 

5.3556 

-0.1545 

20.8390 



507 

10.2709 

5.5859 

-0.6772 

21.2190 



508 

10.2160 

5.7270 

-1.0088 

21.4407 



509 

10.1735 

5.8134 

-1.2206 

21.5675 



510 

10.1404 

5.8661 

-1.3569 

21.6378 




FORECAST OUTPUT DATA 

SET 



Obs 

T 

Y FORECAST 

STD 

L95 

U95 

RESIDUAL 

1 

481 

0.7780 0.7496 

1 .01489 

-1.2396 

2.7387 

0.02846 

2 

482 

1.9415 3.3176 

1.01489 

1.3285 

5.3068 

-1.37609 

3 

483 

5.6783 4.3649 

1 .01489 

2.3758 

6.3541 

1.31336 

4 

484 

8.1388 8.0927 

1 .01489 

6.1036 

10.0818 

0.04612 

5 

485 

9.7920 9.6122 

1 .01489 

7.6230 

11.6013 

0.17978 

(more 

output 

lines) 





26 

506 

10.3423 

5.35559 

-0.1545 

20.8390 


27 

507 

10.2709 

5.58587 

-0.6772 

21.2190 


28 

508 

10.2160 

5.72703 

-1.0088 

21.4407 


29 

509 

10.1735 

5.81339 

-1.2206 

21.5675 


30 

510 

10.1404 

5.86612 

-1.3569 

21.6378 



FORECASTS FOR GENERATED DATA 
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In addition to generated data, logarithms of flow rates for the Neuse River at Goldsboro, North 
Carolina, and 30 miles downstream at Kinston, North Carolina, are analyzed. These data include 400 
daily observations. Obviously, the flow rates develop a seasonal pattern over the 365 days in a year, 
causing the ACF to die off slowly. Taking differences of the logarithmic observations produces 
ACFs that seem well behaved. The goal is to relate flow rates at Kinston to those at Goldsboro. The 
differenced data should suffice here even though nonstationarity is probably caused by the 365-day 
seasonal periodicity in flows. 

You can obtain a model for the logarithms of the Goldsboro flow rates by using the following SAS 
statements: 

PROC ARIMA DATA=RIVER; 

IDENTIFY VAR=LGOLD(1) NOPRINT; 

ESTIMATE Q=1 P=3 METHOD=ML NOCONSTANT MAXIT=100; 

IDENTIFY VAR=LKINS(1) CROSSCOR=(LGOLD(l)); 

TITLE 'FLOW RATES OF NEUSE RIVER AT GOLDSBORO AND KINSTON'; 

RUN; 


The results are shown in Output 4.17. 

Output 4.17 Analyzing Logarithms of Flow Data with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA 


FLOW RATES OF NEUSE RIVER AT GOLDSBORO AND KINSTON 

The ARIMA Procedure 


Maximum Likelihood Estimation 




Standard 


Approx 


Parameter 

Estimate 

Error 

t Value 

Pr > 111 

Lag 

MAI ,1 

0.87394 

0.05878 

14.87 

<.0001 

1 

AR1 ,1 

1.24083 

0.07467 

16.62 

<.0001 

1 

AR1 ,2 

-0.29074 

0.08442 

-3.44 

0.0006 

2 

AR1 ,3 

-0.11724 

0.05394 

h-- 

C\1 

0.0297 

3 


Variance Estimate 

0.039916 

Std Error Estimate 

0.199791 

AIC 

-148.394 

SBC 

-132.438 

Number of Residuals 

399 


Correlations of Parameter Estimates 


Parameter 

MAI ,1 

AR1 ,1 

AR1 ,2 

AR1 ,3 

MAI 

,1 

1 .000 

0.745 

-0.367 

0.374 

AR1 

,1 

0.745 

1 .000 

-0.783 

0.554 

AR1 

,2 

-0.367 

-0.783 

1 .000 

-0.847 

AR1 

,3 

0.374 

0.554 

-0.847 

1 .000 
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Output 4.17 Analyzing Logarithms of Flow Data with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 





Autocorrelation Check of 

Residuals 





To 

Chi- 


Pr > 









Lag 

Square 

DF 

ChiSq 



—Autocorrelations — 

— 

" - - - 


6 

0.77 

2 

0.6819 


0.001 0.003 

0.012 

-0.032 

0 

020 

-0.019 

12 

8.67 

8 

0.3711 


0.023 -0.032 

0.074 

0.041 

-0 

095 

-0.040 

O 18 

14.28 

14 

0.4287 


0.040 0.006 

C 

.071 

-0.066 

0 

045 

0.020 

24 

19.19 

20 

0.5097 


0.024 0.004 

0.072 

-0.014 

0 

071 

-0.024 

30 

22.15 

26 

0.6807 


0.018 0.058 

-0.052 

-0.021 

0 

007 

-0.001 

36 

26.18 

32 

0.7555 


0.005 0.025 

0.022 

-0.089 

-0 

008 

-0.005 

42 

32.18 

38 

0.7347 


0.065 0.047 

0.080 

-0.016 

0 

006 

-0.023 

48 

34.21 

44 

0.8555 


0.029 0.005 

-0.007 

-0.010 

0 

042 

0.041 





Model for Variable 

LGOLC 









Period(s) of Differencing 

1 









No 

lean term in this 

model. 









Autoregressive Factors 







Factor 1: 

1 - 1.24081 

B** (1) + 0.29074 B** 

(2) + 

0.11724 B** 

(3) 







Moving Average Factors 









Factor 1: 1 - 0.87394 B** 

(1) 









Name of Variable = 

LKINJ 








Period(s) of 

Differencing 



1 






Mean 

of Working 

Series 



0.006805 






Standard Deviation 



0.152423 






Number of Observations 



399 






Observation^ 

) eliminated by differencing 

1 









Autocorrelations 






Lag 

Covariance 

Correlation 


1 9 8 7 6 5 4 3 

2 10 12 3 

4 5 6 7 8 

9 1 


Std Error 

0 

0.023233 


1 .00000 




******************** 


0 

1 

0.012735 


0.54814 




*********** 



0.050063 

2 

0.0043502 


0.18724 




* * * * 




0.063343 

3 

0.0022257 


0.09580 




* * 




0.064715 

4 

-0.0002524 


- .01086 








0.065070 

5 

-0.0025054 


- .10784 



* * 





0.065074 

6 

-0.0037296 


- .16053 



* * * 





0.065521 

7 

-0.0047520 


- .20454 



* * * * 





0.066499 

8 

-0.0040610 


- .17480 



* * * 





0.068057 

9 

-0.0028046 


- .12072 



* * 





0.069173 

10 

-0.0033163 


- .14274 



* * * 





0.069699 

11 

-0.0034509 


- .14853 



* * * 





0.070428 

12 

-0.0026756 


- .11517 



* * 





0.071209 

13 

-0.0012425 


- .05348 



* 





0.071674 

14 

0.00073313 


0.03156 




* 




0.071774 

15 

0.0014717 


0.06334 




* 




0.071809 
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Output 4.17 Analyzing Logarithms of Flow Data with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 


Inverse Autocorrelations 

Lag Correlation -1 987654321 01 234567891 

1 -0.57039 

2 0.26078 

3 -0.13604 

4 0.05091 

5 0.01526 

6 0.02426 

7 -0.00102 

8 0.06337 

9 -0.05979 

10 0.03898 

11 -0.00474 

12 0.05098 

13 -0.04073 

14 0.08960 

15 -0.17275 


Partial Autocorrelations 

Lag Correlation -1 987654321 01 234567891 

1 0.54814 

2 -0.16184 

3 0.09580 

4 -0.12466 

5 -0.06238 

6 -0.08767 

7 -0.10017 

8 -0.00664 

9 -0.02847 

10 -0.10749 

11 -0.05583 

12 -0.05502 

13 0.00784 

14 0.04108 

15 -0.01643 


Autocorrelation Check for White Noise 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



- -Autocorrelations-- 



6 

153.89 

6 

<.0001 

0.548 

0.187 

0.096 

-0.011 

-0.108 

-0.161 

12 

212.40 

12 

<.0001 

-0.205 

-0.175 

-0.121 

-0.143 

-0.149 

-0.115 

18 

222.41 

18 

<.0001 

-0.053 

0.032 

0.063 

0.002 

0.076 

0.101 

24 

231.81 

24 

<.0001 

0.074 

0.074 

0.069 

0.057 

0.030 

-0.050 
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Output 4.17 Analyzing Logarithms of Flow Data with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 




Variable LG0LD has 

been differenced. 






Correlation of 

LKINS and LGOLD 






Peniod(s 

) of Differencing 



1 




Number of Observations 



399 




Observation(s) eliminated 

by differencing 


1 




Variance 

of transformed series LKINS 


0.016725 




Variance 

of transformed series LG0LD 


0.039507 





Both series have 

been prewhitened 







Crosscorrelations 





Lag 

Covariance 

Correlation -1 

98765432 

1 0 1 2 3 4 5 6 7 

8 9 

1 

-15 

-0.0005089 

-.01980 | 





1 

-14 

0.00075859 

0.02951 | 



* 


1 

-13 

0.00003543 

0.00138 | 





1 

-12 

-0.0005770 

-.02245 | 





1 

-11 

-0.0004178 

-.01625 | 





1 

-10 

-0.0021530 

-.08376 | 


* * 



1 

-9 

-0.0000599 

-.00233 | 





1 

-8 

0.0010101 

0.03929 | 



* 


1 

-7 

0.00044281 

0.01723 | 





1 

-6 

-0.0010635 

-.04137 | 


* 



1 

-5 

-0.0002580 

-.01004 | 





1 

-4 

0.00001085 

0.00042 | 





1 

-3 

0.00092003 

0.03579 | 



* 


1 

-2 

0.0042649 

0.16592 | 



* * * 


1 

-1 

-0.0000873 

-.00339 | 





1 

0 

-0.0012901 

-.05019 | 


* 



1 

1 

0.017885 

0.69577 | 



************** 

0 

1 

2 

0.0092795 

0.36100 | 



******* 


1 

3 

0.00031429 

0.01223 | 





1 

4 

0.0017646 

0.06865 | 



* 


1 

5 

0.00069119 

0.02689 | 



* 


1 

6 

0.00027701 

0.01078 | 





1 

7 

0.00061810 

0.02405 | 





1 

8 

0.00012023 

0.00468 | 





1 

9 

0.00043703 

0.01700 | 





1 

10 

0.00098972 

0.03850 | 



* 


1 

11 

0.00044395 

0.01727 | 





1 

12 

-0.0022054 

-.08580 | 


* * 



1 

13 

-0.0013050 

-.05077 | 


* 



1 

14 

-0.0012982 

-.05050 | 


* 



1 

15 

0.00034243 

0.01332 | 





1 
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Output 4.17 Analyzing Logarithms of Flow Data with the IDENTIFY and ESTIMATE Statements: 
PROC ARIMA (continued) 


Crosscorrelation Check Between Series 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 



--Crosscorrelations- - 



5 

248.39 

6 

<.0001 

-0.050 

0.696 

0.361 

0.012 

0.069 

0.027 

11 

249.50 

12 

<.0001 

0.011 

0.024 

0.005 

0.017 

0.039 

0.017 

17 

263.88 

18 

<.0001 

-0.086 

-0.051 

-0.051 

0.013 

0.136 

-0.070 

23 

266.85 

24 

<.0001 

0.031 

0.043 

-0.019 

0.001 

0.064 

0.013 


Both variables have been prewhitened by the following filter: 
Prewhitening Filter 

Autoregressive Factors 

Factor 1: 1 - 1.24083 B**(1) + 0.29074 B**(2) + 0.11724 B**(3) 

Moving Average Factors 
Factor 1: 1 - 0.87394 B**(1) 


The output from the ESTIMATE statement O shows a reasonable fit. Cross-correlations from the 
second IDENTIFY statement © show that a change in flow rates at Goldsboro affects the flow at 
Kinston one and two days later, with little other effect. This suggests C(l - 0B)X, , as a transfer 
function. Add the following SAS statements to the code above: 

ESTIMATE INPUT=(1$(1)LGOLD) PLOT METHOD=ML; 

RUN; 

Results are shown in Output 4.18. Diagnostics from the PLOT option © are used to identify an error 
model. The cross-correlation check © looks reasonable, but the autocorrelation check © indicates 
that the error is not white noise. This also implies that t statistics for the model parameters are 
computed from improper standard errors. 
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Output 4.18 Modeling Flow Rates: Identifying an Error Model through the PLOT Option 




FLOW RATES OF NEUSE RIVER AT 

GOLDSBORO 

AND KINSTON 





The ARIMA Procedure 







Maximum Likelihood Estimation 







Standard 

Approx 




Parameter 

Estimate 

Error t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 


0.0018976 

0.0043054 0.44 

0.6594 

0 

LKINS 

0 

NUM1 


0.43109 

0.02107 20.46 

<.0001 

0 

LGOLD 

1 

NUM1 

1 

-0.22837 

0.02106 -10.84 

<.0001 

1 

LGOLD 

1 




Constant Estimate 

0.001898 







Variance Estimate 

0.00735 







Std Error Estimate 

0.085733 







AIC 

-820.849 







SBC 

-808.897 







Number of Residuals 

397 







Correlations of Parameter Estimates 





Variable LKINS 

LGOLD 

LGOLD 





Parameter MU 

NUM1 

NUM1 ,1 





LKINS MU 1.000 

-0.018 

0.020 





LGOLD NUM1 -0.018 

1 .000 

0.410 





LGOLD NUM1,1 0.020 

0.410 

1 .000 






Autocorrelation Check of 

Residuals 

© 



To 

Chi- 


Pr > 





Lag 

Square 

DF 

ChiSq . 

--Autocorrelations- 



6 

76.40 

6 

<.0001 0.315 -0.088 

-0.195 

-0.169 

-0.106 

-0.073 

12 

87.15 

12 

<.0001 -0.105 -0.081 

0.062 

0.042 

0.054 

-0.013 

18 

97.51 

18 

<.0001 -0.023 -0.005 

-0.033 

-0.140 

-0.010 

0.059 

24 

114.97 

24 

<.0001 0.092 0.150 

0.059 

0.036 

-0.033 

-0.069 

30 

120.35 

30 

<.0001 -0.044 -0.053 

-0.056 

-0.062 

-0.023 

-0.019 

36 

131.49 

36 

<.0001 0.031 0.042 

0.087 

0.099 

0.063 

-0.039 

42 

137.18 

42 

<.0001 -0.049 -0.042 

-0.041 

-0.004 

-0.048 

-0.068 

48 

152.24 

48 

<.0001 0.018 0.128 

0.103 

0.056 

0.013 

-0.054 
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Output 4.18 Modeling Flow Rates: Identifying an Error Model through the PLOT Option (continued) 




Autocorrelation Plot 

of Residuals 










o 


Lag 

Covariance 

Correlation 

-1 

9 8 7 6 5 

4 

3 2 10 12 3 

4 5 6 7 8 9 1 

Std Error 

0 

0.0073501 

1.00000 

1 



|********************| 

0 

1 

0.0023175 

0.31531 

1 



1 ****** 

1 

0.050189 

2 

-0.0006451 

-.08776 

1 



* * 1 

1 

1 

0.054952 

3 

-0.0014358 

- .19534 

1 



* * * * 1 

1 

1 

0.055304 

4 

-0.0012412 

- .16886 

1 



* * * 1 

1 

1 

0.057016 

5 

-0.0007808 

- .10623 

1 



* * 1 

1 

1 

0.058262 

6 

-0.0005350 

-.07279 

1 



,*| . 

1 

0.058747 

7 

-0.0007739 

- .10530 

1 



* * 1 

1 

1 

0.058974 

8 

-0.0005976 

-.08130 

1 



* * 1 

1 

1 

0.059446 

9 

0.00045435 

0.06182 

1 



■ 1 * ■ 

1 

0.059725 

10 

0.00031116 

0.04233 

1 



■ 1 * ■ 

1 

0.059886 

11 

0.00039753 

0.05408 

1 



■ 1 * ■ 

1 

0.059962 

12 

-0.0000927 

-.01261 

1 




1 

0.060084 

13 

-0.0001700 

-.02312 

1 




1 

0.060091 

14 

-0.0000400 

-.00545 

1 




1 

0.060113 

15 

-0.0002424 

-.03298 

1 



,*| . 

1 

0.060115 

16 

-0.0010321 

- .14042 

1 



* * * 1 

1 

1 

0.060160 

17 

-0.0000751 

-.01021 

1 




1 

0.060980 

18 

0.00043128 

0.05868 

1 



■ 1 * ■ 

1 

0.060984 

19 

0.00067524 

0.09187 

1 



1 * * 

1 

0.061126 

20 

0.0011004 

0.14972 

1 



1 * * * 

1 

0.061473 

21 

0.00043052 

0.05857 

1 



■ 1 * ■ 

1 

0.062385 

22 

0.00026751 

0.03640 

1 



■ 1* ■ 

1 

0.062523 

23 

-0.0002427 

-.03301 

1 



■ * 1 ■ 

1 

0.062577 

24 

-0.0005082 

-.06914 

1 



■ * 1 ■ 

1 

0.062621 



" 

" marks two standard errors 






Inverse Autocorrelations 




Lag 

Correlation 

-1 

9 8 7 6 5 

4 

3 2 10 12 3 

4 5 6 7 8 9 1 



1 

-0.28165 

1 



****** 1 

1 



2 

0.20889 

1 



1 * * * * 

■ 1 

1 



3 

0.12155 

1 



1 * * 

■ 1 

1 



4 

0.07801 

1 



1 * * 

■ 1 

1 



5 

0.10685 

1 



1 * * 

* 1 

1 



6 

0.05649 

1 



■ 1 * ■ 

1 



7 

0.03509 

1 



■ 1 * ■ 

1 



8 

0.17435 

1 



1 * * * 

■ 1 

1 



9 

-0.06121 

1 



,*| . 

1 



10 

0.09295 

1 



1 * * 

■ 1 

1 



11 

-0.00055 

1 




1 



12 

0.03401 

1 



■ 1 * ■ 

1 



13 

0.03761 

1 



■ 1 * ■ 

1 



14 

0.01513 

1 




1 



15 

-0.07029 

1 



,*| . 

1 



16 

0.15677 

1 



1 * * * 

■ 1 

1 



17 

-0.09352 

1 



* * 1 

1 
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Output 4.18 Modeling Flow Rates: Identifying an Error Model through the PLOT Option (continued) 



18 


0.00402 

1 




1 



19 


0.02305 

1 




1 



20 


-0.09855 

1 


* * 1 

1 


1 



21 


0.06347 

1 


■ 1 * ■ 


1 



22 


-0.07294 

1 


,*| . 


1 



23 


0.02384 

1 




1 



24 


0.00998 

1 




1 






Partial 

Autocorrelations 





Lag 

Correlation 

-19 8 

7 6 5 4 3 

21012345678 

9 

1 



1 


0.31531 

1 


1 ****** 


1 



2 


-0.20784 

1 


* * * * 1 

1 


1 



3 


-0.11186 

1 


* * 1 

1 


1 



4 


-0.09195 

1 


* * 1 

1 


1 



5 


-0.07202 

1 


,*| . 


1 



6 


-0.08240 

1 


* * 1 

1 


1 



7 


-0.13441 

1 


* * * 1 

1 


1 



8 


-0.07821 

1 


* * 1 

1 


1 



9 


0.04706 

1 


■ 1 * ■ 


1 



10 


-0.08698 

1 


* * 1 

1 


1 



11 


0.01795 

1 




1 



12 


-0.07777 

1 


* * | 

1 


1 



13 


-0.01386 

1 




1 



14 


-0.01931 

1 




1 



15 


-0.07046 

1 


,*| . 


1 



16 


-0.15748 

1 


* * * 1 

1 


1 



17 


0.07068 

1 


■ 1 * ■ 


1 



18 


-0.03508 

1 


,*| . 


1 



19 


0.03301 

1 


■ 1 * ■ 


1 



20 


0.08679 

1 


1 * * 


1 



21 


-0.00999 

1 




1 



22 


0.07693 

1 


1 * * 


1 



23 


-0.03975 

1 


,*| . 


1 



24 


-0.01307 

1 




1 




Crosscorrelation Check 

of Residuals with Input LGOLD 




To 

Chi- 


Pr > 




0 



Lag 

Square 

DF 

ChiSq 



- --Crosscorrelations- 




5 

4.08 

5 

0.5373 

0.011 

-0.012 

-0.010 0.071 


0.065 

0.027 

11 

11.11 

11 

0.4345 

0.036 

-0.007 

0.019 -0.008 


-0.076 

-0.101 

17 

21.14 

17 

0.2202 

-0.033 

-0.025 

0.027 0.140 


-0.007 

0.056 

23 

24.04 

23 

0.4014 

0.036 

-0.048 

-0.041 -0.015 


-0.002 

0.042 

29 

27.77 

29 

0.5303 

-0.024 

-0.065 

-0.063 -0.014 


-0.006 

0.020 

35 

33.12 

35 

0.5593 

0.023 

0.007 

0.011 -0.063 


0.016 

0.093 

41 

44.70 

41 

0.3193 

0.126 

0.045 

-0.038 -0.067 


-0.047 

0.055 

47 

50.83 

47 

0.3252 

-0.044 

-0.057 

0.062 0.072 


-0.005 

-0.033 
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Output 4.18 Modeling Flow Rates: Identifying an Error Model through the PLOT Option (continued) 


Model for Variable LKINS 


Estimated Intercept 0 

.001898 

Period(s) of Differencing 

1 

Input Number 1 


Input Variable 

LGOLD 

Shift 

1 

Period(s) of Differencing 

1 

Numerator Factors 


Factor 1: 0.43109 + 0.22837 

B** (1 ) 


An ARMA(2,1) model fits the error term. Make the final estimation of the transfer function with 
noise by replacing the ESTIMATE statement (the one with the PLOT option) with 

ESTIMATE P=2 Q=1 INPUT=(1$(1)LGOLD) METHOD=ML NOCONSTANT 
ALTPARM; 

RUN; 

Output 4.19 shows the results, and the model becomes O 
VLKINS, = .49539 (l + .55B)VLGOLD,_i +(l-.8877B) 

/(l-1.16325B + .47963B 2 )^ 

Because you encountered a pure delay, this is an example of a leading indicator, although this term is 
generally reserved for economic data. More insight into the effect of this pure delay is obtained 
through the cross-spectral analysis in Chapter 7. 
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Output 4.19 Estimating the Final Transfer Function: PROC ARIMA 




FLOW RATES OF NEUSE RIVER AT GOLDSBORO AND 

KINSTON 







The ARIMA Procedure 








Maximum Likelihood Estimation 








Standard 


Approx 




Parameter Estimate 

Error t 

Value 

Pr > |t| 

Lag 

Variable 

Shift 

MAI ,1 

0 

88776 

0.03506 

25.32 

<.0001 

1 

LKINS 

0 

AR1 ,1 

1 

16325 

0.05046 

23.05 

<.0001 

1 

LKINS 

0 

AR1 ,2 

-0 

47963 

0.04564 

-10.51 

<.0001 

2 

LKINS 

0 

SCALE1 

0 

49539 

0.01847 

26.83 

<.0001 

0 

LGOLD 

1 

NUM1 ,1 

-0 

55026 

0.04540 

-12.12 

<.0001 

1 

LGOLD 

1 





Variance Estimate 

0.005838 








Std Error Estimate 

0.076407 








AIC 


-909.399 








SBC 


-889.48 








Number of Residuals 

397 







Correlations of 

Parameter Estimates 




Variable 


LKINS 

LKINS 

LKINS 

LGOLD 

LGOLD 



Parameter 


MAI ,1 

AR1 ,1 

AR1 ,2 

SCALE1 

NUM1 ,1 



LKINS 


MAI ,1 

1 .000 

0.478 

0.126 

0.108 

-0.091 



LKINS 


AR1 ,1 

0.478 

1 .000 

-0.618 

0.056 

-0.097 



LKINS 


AR1 ,2 

0.126 

-0.618 

1 .000 

-0.121 

0.042 



LGOLD 


SCALE1 

0.108 

0.056 

-0.121 

1 .000 

0.590 



LGOLD 


NUM1 ,1 

-0.091 

-0.097 

0.042 

0.590 

1 .000 





Autocorrelation 

Check of 

Residuals 




To 

Chi- 


Pr 

> 






Lag 

Square 

DF 

ChiSq . 


--Autocorrelations- 



6 

0.37 

3 

0.9468 0.005 

-0.008 

-0.010 

-0.001 

0.018 

0.020 

12 

10.07 

9 

0.3446 -0.043 

-0.094 

0.092 

-0.020 

0.044 

-0.048 

18 

20.04 

15 

0.1703 -0.030 

0.004 

0.036 

-0.132 

0.048 

0.045 

24 

31.10 

21 

0.0720 0.023 

0.145 

0.001 

0.064 

-0.005 

-0.025 

30 

33.30 

27 

0.1874 0.019 

-0.010 

-0.018 

-0.056 

0.002 

-0.035 

36 

37.66 

33 

0.2645 0.024 

-0.007 

0.045 

0.044 

0.052 

-0.052 

42 

39.40 

39 

0.4522 0.003 

0.018 

0.005 

0.058 

-0.005 

-0.012 

48 

47.78 

45 

0.3605 0.028 

0.111 

0.054 

0.035 

0.035 

-0.010 
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Output 4.19 Estimating the Final Transfer Function: PROC ARIMA (continued) 


Crosscorrelation Check of Residuals with Input LGOLD 


To 

Chi- 


Pr > 













Lag 

Square 

DF 

ChiSq 



. 


- -Crosscorrelations- - 

— 


. 

• - - - 

5 

6 

.41 

4 

0.1705 

-0 

.044 

-0 

.062 

0 

.010 

0 

.079 

0 

.054 

0 

.034 

11 

14 

.95 

10 

0.1340 

0 

.083 

0 

.038 

0 

.100 

0 

.042 

-0 

.031 

-0 

.024 

17 

28 

.12 

16 

0.0306 

0 

.041 

-0 

.002 

0 

.040 

0 

.130 

-0 

.051 

0 

.102 

23 

29 

.97 

22 

0.1192 

0 

.044 

-0 

.036 

0 

.006 

0 

.003 

0 

.003 

0 

.038 

29 

33 

.48 

28 

0.2185 

-0 

.047 

-0 

.053 

-0 

.056 

-0 

.006 

-0 

.026 

-0 

.002 

35 

40 

.58 

34 

0.2028 

-0 

.008 

-0 

.018 

-0 

.003 

-0 

.087 

0 

.045 

0 

.089 

41 

48 

.60 

40 

0.1650 

0 

.104 

0 

.040 

-0 

.008 

-0 

.025 

-0 

.001 

0 

.085 

47 

55 

.14 

46 

0.1674 

-0 

.069 

-0 

.032 

0 

.088 

0 

.048 

-0 

.019 

-0 

.017 


Model for Variable LKINS O 
Period(s) of Differencing 1 
No mean term in this model. 


Autoregressive Factors 

Factor 1: 1 - 1.16325 B**(1) + 0.47963 B**(2) 


Moving Average Factors 
Factor 1: 1 - 0.88776 B**(1) 


Input Number 1 

Input Variable LGOLD 
Shift 1 
Period(s) of Differencing 1 
Overall Regression Factor 0.495394 


Numerator Factors 
Factor 1: 1 + 0.55026 B**(1) 
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4.3.3.4 Summary of Modeling Strategy 

Follow these steps in case 3 to complete your modeling: 

1. Identify and estimate model for input X (IDENTIFY, ESTIMATE). 

2. Prewhiten Y and X using model from item 1 (IDENTIFY). 

3. Compute cross-correlations, y XY (j), to identify transfer function form (IDENTIFY). 

4. Fit transfer function and compute and analyze residuals (ESTIMATE, PLOT). 

5. Fit transfer function with noise model (ESTIMATE). 

6. Forecast X and Y (FORECAST). 

4.3.4 Case 3B: Intervention 

Suppose you use as an input X a sequence that is 0 through time 20 and 1 from time 21 onward. If 
the model is 

Y t = a + px r + noise 

you have 

Y, = (/. + noise 

through time 20 and 

Y t = a + p + noise 

after time 20. Thus, Y experiences an immediate level shift (from a to a + P) at time 21. Now change 
the model to 

Y t - pY M = a + px, + noise 
or 

Y t = a' + p / (l - p B)X t + noise 

where a' = a/(l - p) (the expected value of Y when X is 0). You can also write 

Y t = od + p(x, + pX M + p"X^_, +... noise 
At time 21, X^l and the previous Xs are 0, so 
Y 21 = a' + P + noise 
At time 22 you get 

Y 22 = a' + P (l + p) + noise 

Y eventually approaches 

a'+ P(l + p + p 2 +. . . ) = a' + p/(l-p) 

if you ignore the noise term. Thus, you see that ratios of polynomials in the backshift operator B can 
provide interesting approaches to new levels. 
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When you use an indicator input, you cannot prewhiten. Therefore, impulse-response weights are not 
proportional to cross-covariances. You make the identification by comparing the behavior of Y ( near 
the intervention point with a catalog of typical behaviors for various transfer function forms. Several 
such response functions for X=1 when t >20 and 0 otherwise are shown in Output 4.20. 


Output 4.20 

Plotting 

Intervention 

Models 
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Output 4.20 

Plotting 

Intervention 

Models 

(continued) 
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Output 4.20 

Plotting 

Intervention 

Models 

(continued) 
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Output 4.21 shows calls for directory assistance in Cincinnati, Ohio (McSweeny, 1978). 
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219 


Output 4.21 

Plotting the 
Original Data 


DIRECTORY ASSISTANCE 
MONTHLY AVERAGE CALLS PER DAY 

CALLS 



JAN62 JAN64 JAN66 JANGS JAN70 JAN72 JAN74 JAN76 JAN7S 

DATE 


Prior to March 1974 directory assistance was free, but from that day on a charge was imposed. The 
data seem to show an initial falling off of demand stalling in February, which may be an anticipation 
effect. The data clearly show an upward trend. You check the pre-intervention data for stationary 
with the code 

PROC ARIMA DATA=CALLS; 

IDENTIFY VAR=CALLS STATIONARITY = (ADF=(2,3,12,13) ); 

IDENTIFY VAR=CALLS(1); 

ESTIMATE P=(12) METHOD=ML; 

WHERE DATE < '01FEB74'D; 

RUN; 

Some of the results are shown in Output 4.22. Only the trend tests are of interest since there is 
clearly a trend; however, none of the other tests could reject a unit root either. Tests with 12 and 13 
lagged differences are requested in anticipation of seasonality. Below this are the chi-square checks 
for a seasonal AR model for the first differences. The fit is excellent and the seasonal AR parameter 
0.5693 is not too close to 1. With this information you see that only the unit root tests with 12 or 
more lags are valid. 
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Output 4.22 Unit Root Tests, Pre-intervention Calls Data 


Augmented Dickey-Fuller Unit Root Tests 


Type 

Lags 

Rho 

Pr < Rho 

Tau 

Pr < Tau 

F 

Pr > F 

T rend 

2 

-18.1708 

0.0888 

-2.48 

0.3378 

3.37 

0.5037 


3 

-31.1136 

0.0045 

-3.04 

0.1252 

4.84 

0.2091 


12 

124.1892 

0.9999 

-2.96 

0.1489 

4.70 

0.2387 


13 

52.3936 

0.9999 

-3.18 

0.0924 

5.46 

0.0964 


The ARIMA Procedure 
Autocorrelation Check of Residuals 


To 

Chi- 


Pr > 


Lag 

Square 

DF 

ChiSq 


6 

2.71 

5 

0.7451 

-0.015 

12 

6.07 

11 

0.8683 

-0.004 

18 

11.10 

17 

0.8511 

0.018 

24 

16.64 

23 

0.8263 

-0.002 


Autocorrelations 


-0.001 

0.065 

0.019 

-0.111 

-0.030 

0.060 

-0.039 

-0.085 

0.067 

-0.067 

-0.067 

0.029 

-0.095 

0.035 

-0.120 

-0.028 

0.082 

-0.149 

-0.006 

0.049 


Model for Variable CALLS 


Estimated Mean 1.077355 

Period(s) of Differencing 1 

Autoregressive Factors 

Factor 1: 1 - 0.56934 B**(12) 


A first difference will reduce a linear trend to a constant, so calls tend to increase by 1.077 per 
month. The intervention variable IMPACT is created, having value 1 from February 1974 onward. 
Since the majority of the drop is seen in March, you fit an intervention model of the form 
(P„ - PjB)X f , where X f is the IMPACT variable at time t. The first time X f is 1, the effect is P 0 , 
and after that both X f and X f _j will be 1 so that the effect is P 0 -Pj. You anticipate a negative P 0 
and a larger-in-magnitude and positive Pj. A test that P 0 = 0 is a test for an anticipation effect. 
Motivated by the pre-intervention analysis, you try the same seasonal AR(1) error structure and 
check the diagnostics to see if it suffices. The code is as follows: 

PROC ARIMA; 

IDENTIFY VAR=CALLS(1) CROSSCOR= (IMPACT(1)) NOPRINT; 

ESTIMATE INPUT = ((1)IMPACT) P=(12) METHOD=ML; 

RUN; 
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It is seen in Output 4.23 that all terms except mu are significant. The trend part of the fitted model is 
overlaid on the data in Output 4.24. Because the model has a unit root, the data can wander fairly far 
from this trend, and this indeed happens. It also explains why the standard error for mu is so large; 
that is, with random walk errors it is difficult to accurately estimate the drift term. Despite this, the 
model seems to capture the intervention well and seems poised to offer an accurate forecast of the 
next few values. The drop of-123 in calls the month prior to the charge is significant, so there was 
an anticipation effect. An additional drop of 400 leaves the calls at 523 below the previous levels. 


Output 4.23 
PROC ARIMA 
for Calls Data 


DIRECTORY ASSISTANCE, MONTHLY AVERAGE CALLS PER DAY 


The ARIMA Procedure 


Maximum Likelihood Estimation 




Standard 

Approx 




Parameter 

Estimate 

Error t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

2.32863 

2.88679 0.81 

0.4199 

0 

CALLS 

0 

AR1 ,1 

0.45045 

0.06740 6.68 

<.0001 

12 

CALLS 

0 

NUM1 

-123.18861 

20.39502 -6.04 

<.0001 

0 

IMPACT 

0 

NUM1 ,1 

400.69122 

20.34270 19.70 

<.0001 

1 

IMPACT 

0 


Constant Estimate 

1 .279694 





Variance Estimate 

503.7929 





Std 

Error Estimate 

22.44533 





AIC 


1619.363 





SBC 


1632.09 





Number of Residuals 

178 





Autocorrelation Check of Residuals 


To 

Lag 

Chi- 

Square 

DF 

Pr > 
ChiSq 


.Autocorrelations. 


6 

3.43 

5 

0.6342 

0.009 

-0.046 

0.058 

0.016 - 

0.109 

-0.030 

12 

7.91 

11 

0.7209 

-0.026 

0.048 

-0.022 

-0.110 

0.015 

-0.088 

18 

11.47 

17 

0.8312 

-0.024 

-0.093 

0.021 

-0.055 

0.025 

-0.068 

24 

19.83 

23 

0.6518 

0.006 

0.001 

0.062 

-0.162 - 

0.029 

0.098 

30 

22.73 

29 

0.7886 

-0.021 

0.019 

0.016 

-0.025 - 

0.026 

0.105 
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Output 4.24 

Effect of Charge 
for Directory 
Assistance 


DIRECTORY ASSISTANCE 
MONTHLY AVERAGE CALLS PER DAY 

CALLS 



DATE 


To forecast the next few months, you extend the data set with missing values for calls and set the 
intervention variable to 1, assuming the charge will remain in effect. The code below produces the 
plot in Output 4.25. Note how the forecasts and intervals for the historical data have been deleted 
from the plot. The intervals are quite wide due to the unit root structure of the errors. Recall that even 
the historical data have produced some notable departures from trend. Adding other predictor 
variables, like population or new phone installations, might help reduce the size of these intervals, 
but the predictors would need to be extrapolated into the future. 

DATA EXTRA; 

DO T=1 TO 24; 

DATE = INTNX('MONTH','01DEC76'D,T); 

IMPACT=1; 

OUTPUT; 

END; 

RUN; 

DATA ALL; 

SET CALLS EXTRA; 

RUN; 

PROC ARIMA; 

IDENTIFY VAR=CALLS(1) CROSSCOR=( IMPACT(1) ) NOPRINT; 

ESTIMATE INPUT = ( (1) IMPACT ) P=(12) METHOD=ML NOPRINT; 

FORECAST LEAD=24 OUT=GRAPH ID=DATE INTERVAL=MONTH; 

RUN; 

DATA GRAPH; 

SET GRAPH; 

IF CALLS NE . THEN DO; 

FORECAST=.; U95=.; L95=.; 

END; 

RUN; 

PROC GPLOT DATA=GRAPH; 

PLOT (CALLS FORECAST U95 L95)*DATE/OVERLAY; 

SYMBOL1 V=NONE I=JOIN R=4; 

TITLE "FORECASTED CALLS''; 

RUN; 
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Output 4.25 

Forecasts from 

Intervention 

Model 


FORECASTED CALLS 



JAN60 JAN65 JAN70 JAN75 .TAN80 

DATE 


4.4 Further Examples 


4.4.1 North Carolina Retail Sales 

Consider again the North Carolina retail sales data investigated in Chapter 1. Recall that there the 
quarterly sales increases were modeled using seasonal dummy variables; that is, seasonal dummy 
variables were fit to the first differences of quarterly sales. The models discussed in this section 
potentially provide an alternative approach. Here the full monthly data (from which the quarterly 
numbers were computed as averages) will be used. This is an example in which the airline model 
seems a good choice at first, but later runs into some problems. Recall that when a first difference is 
found, often a moving average at lag 1 is appropriate. Likewise, a multiplicative moving average 
structure, specified by ESTIMATE Q = (1)(12), often works well when the first and span 12 

difference, (Y, - Y, ,) - (Y 12 - Y 13 ), has been taken. You can think of these moving average terms 

as somewhat mitigating the impact of the rather heavy-handed differencing operator. As in the IBM 
example in Section 3.4.7, the fitting of these moving average terms causes forecasts to be weighted 
averages of seasonal patterns over all past years where the weights decrease exponentially as you 
move further into the past. Thus the forecast is influenced somewhat by all past patterns but most 
substantially by those of the most recent years. 

The airline model just discussed will be written here as (1 - B)(l - B 12 )Y, = (1 — 0j jB)(l - 0 2 jB 12 )e t , 

introducing double subscripts to indicate which factor and which lag within that factor is being 
modeled. This double-subscript notation corresponds to PROC ARIMA output. The airline model is 
often a good first try when seasonal data are encountered. Now if, for example, 0 2 , = 1, then there is 
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cancellation on both sides of the model and it reduces to (1 - B)Y, = (1 — 0j B)e, . Surprisingly, this 
can happen even with strongly seasonal data. If it does, as it will for the retail sales, it suggests 
considering a model outside the ARIMA class. Consider a model Y, = p, + S, + Z, where S, = S, 12 , 

and Z, has some ARIMA structure, perhaps even having unit roots. Note that S f forms an exactly 
repeating seasonal pattern, as would be modeled using dummy variables. Because of S, the 
autocorrelation function will have spikes at lag 12, as will that of the ordinary first differences since 
S, - S, , is also periodic. However, the span 12 difference Y, - Y, 12 will involve (1 - B 12 )Z t . and 
unless Z, has a unit root at lag 12, estimates of the coefficient of Z M2 will be forced toward the 
moving average boundary. This overdifferencing often results in failure to converge. 

You issue the following SAS statements to plot the data and compute the ACF of the original series, 
first differenced series, and first and seasonally differenced series. 

PROC GPLOT DATA=NCRETAIL; 

PLOT SALES*DATE/HMINOR=0 

HREF='01DEC83'D '01DEC84'D '01DEC85'D '01DEC86'D 
'01DEC87'D '01DEC8 8'D '01DEC89'D '01DEC90'D 
'01DEC91'D '01DEC92'D '01DEC93'D '01DEC94'D; 

TITLE "NORTH CAROLINA RETAIL SALES"; 

TITLE2 "IN MILLIONS"; 

RUN; 

PROC ARIMA DATA=NCRETAIL ; 

IDENTIFY VAR=SALES OUTCOV=LEVELS NLAG=36; 

IDENTIFY VAR=SALES(1) OUTCOV=DIFF NLAG=36; 

IDENTIFY VAR=SALES(1,12) OUTCOV=SEAS NLAG=36; 

RUN; 


The data plot is shown in Output 4.26. The ACF, IACF, and PACF have been saved with the 
OUTCOV=option. Output 4.27 uses this with SAS/GRAPH and a template to produce a matrix of 
plots with rows representing original, (1), and (1,12) differenced data and columns representing, 
from left to right, the ACF, IACF, and PACF. 
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Output 4.26 

Plotting the 
Original Data 


Output 4.27 

Computing the 
ACF with the 
IDENTIFY 
Statement: 
PROC ARIMA 


NORTH CAROLINA RETAIL SALES 

IN M U TTONS 


sales 






























































































226 SAS for Forecasting Time Series 


The plot of the data displays nonstationary behavior (nonconstant mean). The original ACF shows 
slow decay, indicating a first differencing. The ACF of the differenced series shows somewhat slow 
decay at the seasonal lags, indicating a possible span 12 difference. The Q statistics and ACF on the 
SALES(1,12) differenced variable indicate that some MA terms are needed, with the ACF spikes at 1 
and 12 indicating MA terms at lags 1 and 12. Heeding the remarks at the beginning of this section, 
you try a multiplicative structure even though the expected side lobes at 11 and 13 (that such a 
structure implies) are not evident in the ACF. Such a structure also serves as a check on the 
differencing, as you will see. Adding 

ESTIMATE Q=(1)(12) ML; 

to the above code requests that maximum likelihood estimates of the multiplicative MA be fitted to 
the first and span 12 differenced data. The results are in Output 4.28. 


Output 4.28 Fitting the Multiplicative MA Structure 


WARNING: The model defined by the new estimates is unstable. The iteration 
process has been terminated. 

WARNING: Estimates may not have converged. 

ARIMA Estimation Optimization Summary 


Estimation Method 
Parameters Estimated 
Termination Criteria 

Iteration Stopping Value 
Criteria Value 

Maximum Absolute Value of Gradient 
R-Square Change from Last Iteration 
Objective Function 
Objective Function Value 
Marquardt's Lambda Coefficient 
Numerical Derivative Perturbation Delta 
Iterations 
Warning Message 


Maximum Likelihood 
3 

Maximum Relative Change 
in Estimates 
0.001 
69.27824 
365546.8 
0.13941 

Log Gaussian Likelihood 
-923.09 
0.00001 
0.001 
8 

Estimates may not have converged. 


The ARIMA Procedure 
Maximum Likelihood Estimation 




Standard 


Approx 


Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

MU 

-0.65905 

1.52987 

-0.43 

0.6666 

0 

MAI ,1 

0.74136 

0.05997 

12.36 

<.0001 

1 

MA2,1 

0.99979 

83.86699 

0.01 

0.9905 

12 


Constant 

Estimate 

-0.65905 




Variance 

Estimate 

62632.24 




Std Error 

Estimate 

250.2643 




AIC 


1852.18 




SBC 


1860.805 




Number of 

Residuals 

131 
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Output 4.28 Fitting the Multiplicative MA Structure (continued) 


Correlations of Parameter Estimates 




Parameter 

MU 

MAI ,1 

MA2,1 





MU 


1 .000 

0.205 

-0.117 





MAI 

,1 

0.205 

1.00 

-0.089 





MA2 

hi 

-0.117 

-0.089 

1 .000 






Autocorrelation ( 

Check of Residuals 



To 

Chi- 


Pr > 






Lag 

Square 

DF 

ChiSq 


.Autocorrelations- 



6 

9.12 

4 

0.0582 

-0.131 

-0.056 0.175 

0.046 

0.115 

0.032 

12 

16.16 

10 

0.0950 

0.047 

0.115 0.062 

0.120 

-0.047 

0.114 

18 

22.96 

16 

0.1149 

0.009 

0.003 0.211 

0.014 

-0.001 

-0.014 

24 

33.26 

22 

0.0583 

0.075 

-0.026 -0.074 

0.163 

0.087 

-0.137 


Model for variable sales 


Estimated Mean -0.65905 

Period(s) of Differencing 1,12 

Moving Average Factors 

Factor 1: 1 - 0.74136 B**(1) 
Factor 2: 1 - 0.99979 B**(12) 


In Output 4.28, you see that there seems to be a problem. The procedure had trouble converging, the 
standard error on the lag 12 coefficient is extremely large, and the estimate itself is almost 1, 
indicating a possibly noninvertible model. You can think of a near 1.00 moving average coefficient at 
lag 12 as trying to undo the span 12 differencing. Of course, trying to make inferences when 
convergence has not been verified is, at best, questionable. Returning to the discussion at the opening 
of this section, a possible explanation is that the seasonality S f is regular enough to be accounted for 

by seasonal dummy variables. That scenario is consistent with all that has been observed about these 
data. The first difference plus dummy variable model of section 1 did seem to fit the data pretty well. 

The dummy variables can be incorporated in PROC ARIMA using techniques in Section 4.2. 

Letting Sj t through S P t denote monthly indicator variables (dummy variables), your model is 

Y t = a + pt + JjS u + £ 2 S 2 f + • • • + + Z t 

where, from your previous modeling, Z t seems to have a (nonseasonal) unit root. You interpret 
a + pt as a “December line” in that, for December, each S /f is 0. For January, the expected value of 
Y is (a + Sj) + P t\ that is, Sj is a shift in the trend line that is included for all January data and 
similar 8 j values allow shifts for the other 10 months up through November. Because Christmas 
sales are always relatively high, you anticipate that all these Ss. and especially S l , will be negative. 

Using V to denote a first difference, write the model at time t and at time t - 1, then subtract to get 
VY t = Va + P(VI) + <S' 1 VS lf + ckVS 2f H-F J n VS lp + VZ t 
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Now VZ f is stationary if Z t has a unit root, Va = a- a=0, and Vi = t - (t - 1) = 1. Since errors 
should be stationary for proper modeling in PROC ARIMA, the model will be specified in first 
differences as 

VY t = P + <S’ 1 VS U + J 2 VS 2 ^ +"- + <S' n VS lu +vz t 

The parameters have the same interpretations as before. This code fits the model with VZ t specified 
as ARMA(2,1) and plots forecasts. The data set had 24 missing values for sales at the end with 
seasonal indicator variables S /( nonmissing. Note that the seasonal indicator variables can be 

generated without error and so are valid deterministic inputs. 

The following code produces Output 4.29 and Output 4.30. 

PROC ARIMA DATA=NCRETAIL; 

IDENTIFY VAR=SALES(1) CROSSCOR = 

(SI (1) S2 (1) S3 (1) S4 (1) S5 (1) S 6 (1) S7 (1) S8 (1) S9(l) 

S10(l) Sll(l) ) 

NOPRINT; 

ESTIMATE INPUT=(S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 Sll) p=2 q=l ml; 
FORECAST LEAD=24 OUT=OUTl ID=DATE INTERVAL= MONTH; 

RUN; 

PROC GPLOT DATA=OUTl; 

PLOT (SALES L95 U95 FORECAST)*DATE/ 

OVERLAY HREF ='01DEC94'D; WHERE DATE> '01JAN90'D; 

SYMBOL1 V=NONE I=JOIN C=BLACK L=1 R=1 W=l; 

SYMBOL2 V=NONE I=JOIN C=BLACK L=2 R=2 W=l; 

SYMBOL3 V=NONE I=JOIN C=BLACK L=1 R=1 W=2; 

RUN; 

Output 4.29 Seasonal Model for North Carolina Retail Sales 


NC RETAIL SALES 
2 YEARS OF FORECASTS 

The ARIMA Procedure 


Maximum Likelihood Estimation 




Standard 


Approx 




Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

26.82314 

5.89090 

4.55 

<.0001 

0 

sales 

0 

MAI ,1 

0.50693 

0.14084 

3.60 

0.0003 

1 

sales 

0 

AR1 ,1 

-0.39666 

0.14155 

-2.80 

0.0051 

1 

sales 

0 

AR1 ,2 

-0.29811 

0.12524 

-2.38 

0.0173 

2 

sales 

0 

NUM1 

-1068.4 

95.36731 

-11.20 

<.0001 

0 

SI 

0 

NUM2 

-1092.1 

93.36419 

-11.70 

<.0001 

0 

S2 

0 

NUM3 

-611 .48245 

82.36315 

-7.42 

<.0001 

0 

S3 

0 

NUM4 

-476.60662 

89.45164 

-5.33 

<.0001 

0 

S4 

0 

NUM5 

-396.94536 

90.59402 

-4.38 

<.0001 

0 

S5 

0 

NUM6 

-264.63164 

87.94063 

-3.01 

0.0026 

0 

S6 

0 

NUM7 

-371.30277 

90.53160 

-4.10 

<.0001 

0 

S7 

0 

NUM8 

-424.65711 

88.92377 

-4.78 

<.0001 

0 

S8 

0 

NUM9 

-440.79196 

81.05429 

-5.44 

<.0001 

0 

S9 

0 

NUM10 

-642.50812 

92.86014 

-6.92 

<.0001 

0 

S10 

0 

NUM11 

-467.54818 

94.61205 

-4.94 

<.0001 

0 

S11 

0 
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Output 4.29 Seasonal Model for North Carolina Retail Sales (continued) 




Constant Estimate 

45.45892 





Variance Estimate 

57397.62 





Std Error Estimate 

239.578 





AIC 

1988.001 





SBC 

2032.443 





Number of Residuals 

143 





Autocorrelation Check of 

Residuals 



To 

Chi- 

Pr > 




Lag 

Square 

DF ChiSq .Autocorrelations- 



6 

1 .88 

3 0.5973 -0.009 -0.021 

0.008 0.001 

0.108 

0.016 

12 

7.55 

9 0.5801 0.034 0.119 

0.057 0.108 

-0.076 

0.024 

18 

16.08 

15 0.3770 0.013 0.030 

0.222 0.030 

0.014 

-0.037 

24 

27.84 

21 0.1446 0.013 -0.040 

-0.020 0.168 

0.096 

-0.169 



Model for Variable SALES 





Estimated Intercept 

26.82314 





Period(s) of Differencing 

1 





Autoregressive Factors 





Factor 1: 1 + 0.39666 B**(1) 

+ 0.29811 B**(2) 





Moving Average Factors 





Factor 1: 1 - 0.50693 B**(1) 





Input Number 1 






Input Variable 

SI 





Period(s) of Differencing 

1 





Overall Regression Factor 

-1068.41 





Input Number 2 






Input Variable 

S2 





Period(s) of Differencing 

1 





Overall Regression Factor 

-1092.13 





Input Number 3 






Input Variable 

S3 





Period(s) of Differencing 

1 





Overall Regression Factor 

-611.482 





Input Number 4 






Input Variable 

S4 





Period(s) of Differencing 

1 





Overall Regression Factor 

-476.607 





Input Number 5 






Input Variable 

S5 





Period(s) of Differencing 

1 





Overall Regression Factor 

-396.945 
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Output 4.29 Seasonal Model for North Carolina Retail Sales (continued) 




Input Number 6 




Input Variable 

S6 



Period(s 

) of Differencing 

1 



Overall 

Regression Factor 

-264.632 




Input Number 7 




Input Variable 

S7 



Period(s 

) of Differencing 

1 



Overall 

Regression Factor 

-371.303 




Input Number 8 




Input Variable 

S8 



Period(s 

) of Differencing 

1 



Overall 

Regression Factor 

-424.657 




Input Number 9 




Input Variable 

S9 



Period(s 

) of Differencing 

1 



Overall 

Regression Factor 

-440.792 




Input Number 10 



Input Variable 

S10 



Period(s 

) of Differencing 

1 



Overall 

Regression Factor 

-642.508 




Input Number 11 




Input Variable 

S11 



Period(s 

) of Differencing 

1 



Overall 

Regression Factor 

-467.548 



Forecasts for Variable SALES 


Obs 

Forecast 

Std Error 

95% Confidence Limits 

145 

6769.7677 

239.5780 

6300.2034 

7239.3320 

146 

6677.4766 

240.6889 

6205.7350 

7149.2182 

147 

7438.1332 

243.5997 

6960.6865 

7915.5799 

148 

7527.8413 

261.9620 

7014.4053 

8041.2774 

149 

7587.4027 

270.8251 

7056.5952 

8118.2102 

150 

7786.6130 

277.8644 

7242.0088 

8331.2172 

151 

7704.8579 

287.2914 

7141.7771 

8267.9387 

152 

7667.1369 

295.8507 

7087.2802 

8246.9935 

153 

7682.8322 

303.6424 

7087.7040 

8277.9604 

154 

7509.2889 

311.5971 

6898.5697 

8120.0081 

155 

7709.0439 

319.3641 

7083.1018 

8334.9861 

156 

8203.8173 

326.8401 

7563.2225 

8844.4121 

157 

7162.6783 

334.1872 

6507.6835 

7817.6731 

158 

7165.4868 

341 .3916 

6496.3715 

7834.6021 

159 

7672.9379 

348.4302 

6990.0272 

8355.8485 

160 

7834.7312 

355.3315 

7138.2942 

8531.1682 

161 

7941.1827 

362.1054 

7231.4692 

8650.8962 

162 

8100.3044 

368.7526 

7377.5626 

8823.0463 

163 

8020.4722 

375.2818 

7284.9333 

8756.0111 

164 

7993.9393 

381.7001 

7245.8207 

8742.0578 

165 

8004.6236 

388.0121 

7244.1338 

8765.1133 

166 

7829.7326 

394.2228 

7057.0701 

8602.3952 

167 

8031.5161 

400.3374 

7246.8692 

8816.1629 

168 

8525.8866 

406.3599 

7729.4359 

9322.3374 





Chapter 4: The ARIMA Model: Introductory Applications 231 


Output 4.30 shows the resulting graph. 


Output 4.30 

Forecasts from 
Seasonal Model 


NC RETAIL SALES 

2 YEARS OF FORECASTS 

sales 



date 

PLOT • • • sales . Lower 95% Confidence Limit 

Upper 95% Confidence Limit Forecast for sales 


4.4.2 Construction Series Revisited 

Returning to the construction worker series at the beginning of Section 4.1.2, you can fit two models 
both having a first difference. Let one incorporate a seasonal difference and the other incorporate 
seasonal dummy variables SI through S12 to model the seasonal pattern. This code produces two 
forecast data sets, OUTDUM and OUTDIF, that have 24 forecasts from the two models. The data set 
ALL has the original construction data along with seasonal dummy variables SI through S12 that 
extend 24 periods into the future. In Section 4.4.1 the December indicator S12 was dropped to avoid 
a collinearity problem involving the intercept. An equally valid approach is to drop the intercept 
(NOCONSTANT) and retain all 12 seasonal indicators. That approach is used here. 

PROC ARIMA DATA=ALL; 

IDENTIFY VAR=CONSTRCT NLAG=36 NOPRINT; 

IDENTIFY VAR=CONSTRCT(1) STATIONARITY=(ADF=(1,2,3) DLAG=12); 

IDENTIFY VAR=CONSTRCT(1) NOPRINT 

CROSSCOR = (SI (1) S2 (1) S3(1) S4(l) S5(l) S6(l) 

S7 (1) S8 (1) S9 (1) S10(l) Sll(l) S12 (1) ); 

ESTIMATE INPUT = (SI S2 S3 S4 S5 S6 S7 S8 S9 S10 Sll S12 ) 

NOCONSTANT METHOD=ML NOPRINT; 

FORECAST LEAD=24 ID=DATE INTERVAL=MONTH OUT=OUTDUM; 

IDENTIFY VAR=CONSTRCT(1,12) NOPRINT; 

ESTIMATE NOCONSTANT METHOD=ML NOPRINT; 

FORECAST LEAD=24 INTERVAL=MONTH ID=DATE OUT=OUTDIF NOPRINT; 

RUN; 
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Output 4.31 

Seasonal Dummy 
and Seasonal 
Difference 
Forecasts 


CONSTRUCTION REVIEW 

CONSTRUCTION WORKERS IN THOUSANDS 

CONSTROT 



JAN77 JAN78 JAN79 JAN80 JAN81 JAN82 JAN83 JAN84 JAN85 

DATE 


In Output 4.31 the forecast data sets have been merged and forecasts 24 periods ahead have been 
plotted. The forecasts and intervals for the span 12 differenced series are shown as darker lines 
labeled “D,” and those for the dummy variable model are shown as lighter lines with a dot label on 
the far right. The forecasts are quite different. The seasonally differenced series gives much wider 
intervals and a general pattern of decline. The seasonal dummy variables produce forecast intervals 
that are less pessimistic and. 24 periods into the future, are about half the width of the others. Of 
course, wide intervals are expected with differencing. Is there a way to see which model is more 
appropriate? The chi-square statistics for both models show no problems with the models. Note the 
code STATIONARITY=(ADF=( 1,2,3) DLAG=12) for the first differenced series. This DLAG=12 
option requests a seasonal unit root test. Dickey, Hasza, and Fuller (1984) develop this and other 
seasonal unit root tests. Output 4.32 shows the results, and the tau statistics give some evidence 
against the null hypothesis of a seasonal unit root. 
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Output 4.32 

Seasoned Unit 
Root Tests for 
Construction Data 



Seasonal Augmented 

Dickey-Fuller 

Unit 1 

Root Tests 

Type 

Lags 

Rho 

Pr < Rho 

Tau 

Pr < Tau 

Zero Mean 

1 

-6.2101 

0.1810 

-1.77 

0.0499 


2 

-7.4267 

0.1389 

-2.09 

0.0251 


3 

-7.7560 

0.1291 

-2.20 

0.0198 

Single Mean 

1 

-5.7554 

0.2606 

-1.61 

0.0991 


2 

-6.9994 

0.2068 

-1.95 

0.0541 


3 

-7.3604 

0.1930 

-2.07 

0.0428 


The seasonal dummy variable model does not lose as much data to differencing, is a little easier to 
understand, has narrower intervals, and does more averaging of past seasonal behavior. In fact, the 

first and span 12 difference model has forecast Y t = Y t l + (Y t 12 - Y t r J. so the forecast for this 

August is just this July’s value with last year’s July-to-August change added in. The forecast 
effectively makes a copy of last year’s seasonal pattern and attaches it to the end of the series as a 
forecast. Without moving average terms, last year’s pattern alone gives the forecast. For these data, 
these comments along with the fact that the data themselves reject the seasonal difference suggest the 
use of the dummy variable model. 


4.4.3 Milk Scare (Intervention) 

Liu et al. (1998) discuss milk sales in Oahu, Hawaii, during a time period in which the discovery of 
high pesticide levels in milk was publicized. Liu (personal communication) provided the data here. 
The data indicate April 1982 as the month of first impact, although some tainted milk was found in 
March. Output 4.33 shows a graph with March. April, and May 1982 indicated by dots. Ultimately 
eight recalls were issued and publicized, with over 36 million pounds of contaminated milk found. It 
might be reasonable to expect a resulting drop in milk sales that may or may not have a long-term 
effect. It appears that, with the multiple recalls and escalating publicity, the full impact was not 
realized until May 1982, after which recovery began. 

Initially a model was fit to the data before the intervention. A seasonal pattern was detected, but no 
ordinary or seasonal differencing seemed necessary. A P=(l)(12) specification left a somewhat large 
correlation at lag 2, so Q=(2) was added and the resulting model fit the pre-intervention data nicely. 
The intervention response seemed to show an arbitrary value after the first drop, in fact another drop, 
followed by exponential increase upward. The second drop suggests a numerator lag and the 
exponential increase suggests a denominator lag in the transfer function operator. X is a variable that 
is 1 for April 1982 and 0 otherwise. The following code produces an intervention model with this 
pattern. 

PROC ARIMA DATA=LIU; 

IDENTIFY VAR=SALES NOPRINT CROSSCOR=(X); 

ESTIMATE INPUT=( (1) /(1) X ) P=(1)(12) Q=(2) METHOD=ML; 

RUN; 

Output 4.33 and Output 4.34 show the results. 
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Output 4.33 

Effect of 
Tainted Milk 



Output 4.34 

Model for 
Milk Sales 
Intervention 


Effect of Negative Publicity (Contamination) 
on Milk Sales 

The ARIMA Procedure 

Maximum Likelihood Estimation 

Standard Approx 


Parameter 

Estimate 

Error 

t Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

83.01550 

3.98010 

20.86 

<.0001 

0 

sales 

0 

MAI ,1 

-0.34929 

0.11980 

-2.92 

0.0035 

2 

sales 

0 

AR1 ,1 

0.53417 

0.10566 

5.06 

<.0001 

1 

sales 

0 

AR2,1 

0.78929 

0.06964 

11.33 

<.0001 

12 

sales 

0 

NUM1 

-39.89641 

2.83209 

-14.09 

<.0001 

0 

X 

0 

NUM1 ,1 

49.55934 

2.96258 

16.73 

<.0001 

1 

X 

0 

DENI ,1 

0.61051 

0.03289 

18.56 

<.0001 

1 

X 

0 
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Output 4.34 

Model for 
Milk Sales 
Intervention 
(continued) 


Constant Estimate 8.148426 
Variance Estimate 15.88348 
Std Error Estimate 3.985408 
AIC 450.5909 
SBC 466.9975 
Number of Residuals 77 


Autocorrelation Check of Residuals 


To 

Chi- 


Pr > 







Lag 

Square 

DF 

ChiSq 



■Autocorrelations- 



6 

3.98 

3 

0.2636 

-0.027 

0.030 

-0.015 

0.110 

0.019 

0.180 

12 

14.35 

9 

0.1104 

0.116 

0.141 

0.149 

0.000 

0.235 

-0.063 

18 

18.96 

15 

0.2157 

0.052 

0.133 

-0.006 

0.005 

0.154 

-0.046 

24 

23.01 

21 

0.3434 

0.118 

0.064 

0.035 

0.017 

0.010 

-0.131 


By specifying INPUT=( (1)/(1) X), where X is 1 for April 1982 and 0 otherwise, you are fitting an 
intervention model whose form is 

(P„-P 1 B)/(l-a 1 B)X f 

Filling in the estimates, you have 


(-40 - 5 OB) /(I - 0.61B)X f = (-40 - 50B)(1 + 0.6 IB + 0.61 2 B 2 + • • -)X f 

= -40X f - 74.4X._j + 0.61(-74.4)X. 2 + 0.61 2 (-74.4)X._ 3 + • • • 

so when X. is 1, the estimated effect is -40. The next month, X f l is 1 and the effect is -74.4. 

Two months after the intervention the estimated effect is 0.61(-74.4) as recovery begins. This model 
forces a return to the original level. In Output 4.33 a horizontal line at the intercept 83 has been 
drawn and the intervention effects -40, -74.4, and so on, have been added in. Notice how the 
intercept line underestimates the pre-intervention level, and how the estimated recovery seems faster 
than the data suggest. Had you plotted the forecasts, including the autoregressive components, this 
failure of the mean structure in the model might not have been noticed. The importance of plotting 
cannot be overemphasized. It is a critical component of data analysis. Note also that the statistics in 
Output 4.34 give no warning signs of any problems. Again one might think of the autoregressive 
structure as compensating for some lack of fit. 

Might there be some permanent effect of this incident? The model now under consideration does not 
allow it. To investigate this, you add a level shift variable. 

Define the variable LEVEL1 to be 1 prior to April 1982 and 0 otherwise. This will add a constant, 
the coefficient of the column, for the pre-intervention period. It represents the difference between the 
pre-intervention mean and the level to which the post-intervention trend is moving—that is, the level 
attained long after the intervention. If this shift is not significantly different from 0, then the model 
shows no permanent effect. If the shift (coefficient) is significantly larger than 0, then a permanent 
decrease in sales is suggested by the model. If the coefficient happens to be negative, then the pre¬ 
intervention level is less than the level toward which the data are now moving. You issue the 
following code to fit a model with both temporary effects (X) and a permanent level shift (LEVEL 1): 
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PROC ARIMA; 

IDENTIFY VAR=SALES NOPRINT CROSSCOR=(X LEVEL1); 

ESTIMATE INPUT=((1)/(1)X LEVEL1) P=(l)(12) Q=(2) METHOD=ML; 
RUN; 


Output 4.35 and Output 4.36 show the results. 


Output 4.35 

Model Allowing 
Permanent Effect 



Output 4.36 

Intervention 
Model with 
Permanent 
Shift 



Maximum Likelihood Estimation 






Standard 


Approx 




Parameter 

Estimate 

Error t 

Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

75.95497 

2.71751 

27.95 

<.0001 

0 

sales 

0 

MAI ,1 

-0.31261 

0.12036 

-2.60 

0.0094 

2 

sales 

0 

AR1 ,1 

0.29442 

0.11634 

2.53 

0.0114 

1 

sales 

0 

AR2,1 

0.77634 

0.07042 

11.02 

<.0001 

12 

sales 

0 

NUM1 

-31.67677 

3.24333 

-9.77 

<.0001 

0 

X 

0 

NUM1 ,1 

48.55454 

2.99270 

16.22 

<.0001 

1 

X 

0 

DENI ,1 

0.56565 

0.03353 

16.87 

<.0001 

1 

X 

0 

NUM2 

10.79096 

2.22677 

4.85 

<.0001 

0 

levell 

0 


Constant Estimate 

11 .98663 





Variance Estimate 

13.73465 





Std 

Error Estimate 

3.706029 





AIC 



439.2273 





SBC 



457.9778 





Number of Residuals 

77 
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Output 4.36 

Intervention 


Model with 

To 

Chi- 


Pr > 

Permanent 

Lag 

Square 

DF 

ChiSq 

Shift 

(continued) 

6 

12 

2.46 

9.47 

3 

9 

0.4831 

0.3949 


18 

12.27 

15 

0.6582 


24 

18.28 

21 

0.6310 


Autocorrelation Check of Residuals 


Autocorrelations 


0.020 

-0.010 

-0.091 

-0.011 

-0.090 

0.110 

0.094 

0.125 

0.099 

-0.033 

0.188 

-0.083 

-0.010 

0.067 

-0.069 

-0.030 

0.114 

-0.068 

0.080 

0.029 

-0.012 

-0.048 

-0.072 

-0.196 


It appears that the pre-intervention level is about 75.95 +10.79 and the ultimate level to which sales 
will return is 75.95, according to this model. All estimates, including the estimated 10.79 permanent 
loss in sales, are significant. The geometric rate of approach to the new level is 0.56565, indicating a 
faster approach to the new level than that from the first model. Of course, at this point it is clear that 
the old model was misspecified, as it did not include LEVEL 1. The AR1,1 coefficient 0.29 is quite a 
bit smaller than 0.53 from the first model. That is consistent with the idea that the autoregressive 
structure there was in part compensating for the poor fit of the mean function. You can add and 
subtract 1.96(2.2268) from 10.79 to get an approximate 95% confidence interval for the permanent 
component of the sales loss due to the contamination scare. 

Other models can be tried. Seasonal dummy variables might be tried in place of the seasonal AR 
factor. Liu et al. suggest that some sort of trend might be added to account for a decline in consumer 
preference for milk. A simple linear trend gives a mild negative slope, but it is not statistically 
significant. The estimated permanent level shift is about the same and still significant in its presence. 


4.4.4 Terrorist Attack 

On September 11, 2001, terrorists used commercial airliners as weapons to attack targets in the 
United States, resulting in the collapse of the World Trade Center in New York City. American 
Airlines flights were among those involved. The stock market was closed following this incident and 
reopened September 17. In a second incident, an American Airlines jet crashed on November 12, 
2001, in Queens, New York. An intervention analysis of American Airlines stock trading volume (in 
millions) is now done incorporating a pulse and level shift intervention for each of these events, 
defined similarly to those of the milk example in Section 4.4.3. Data through November 19 are used 
here, so there is not a lot of information about the nature of the response to the second incident. A 
model that seems to fit the data reasonably well, with parameters estimated from PROC ARIMA, is 

log(Volume) = 0.05 + (2.58-2.48B)/(l-.76B)X, + 1.49/(1-.80B) P + (1-.52B)/(1-.84B) e 

where X ; is a level shift variable that is 1 after September 11 and 0 before, while P ( is a pulse variable 
that is 1 only on the day of the second incident. The p-values for all estimates except the intercept 
were less than 0.0005 and those for the chi-square check of residuals were all larger than 0.35, 
indicating an excellent fit for the 275 log transformed volume values in the data set. 

This model allows for a permanent effect of the terrorist attack of September 11 but forces the effect 
of the second incident to decline exponentially to 0 over time. The second incident sparked a 
log(volume) increase 1.49 on the day it happened, but j days later, log(volume) is (0.80) J (1.49) above 
what it would have otherwise been, according to the model. The permanent effect of the events of 
September 11 on log volume would be (2.59-2.48)/(l-.76) = 0.46 according to the model. The 
numerator lag for X allows a single arbitrary change from the initial shock (followed by an 
exponential approach at rate 0.76 to the eventual new level). In that sense, the inclusion of this lag 
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acts like a pulse variable and likely explains why the pulse variable for September 11 was not needed 
in the model. The level shift variable for the second incident did not seem to be needed either, but 
with so little data after November 12, the existence of a permanent effect remains in question. 


Output 4.37 shows a graph of the data and a forecast from this model. 


Output 4.37 

American 
Airlines Stock 
Volume 


VOLUME 

VOLUME 



DATE 


Calculations from the log model were exponentiated to produce the graph. The model was fit to the 
full data set, but the option BACK=42 was used in the FORECAST statement so that the data 
following September 11 were not used to adjust the forecasts; that is, only the X and P parts of the 
model are used in the post-September 11 forecasts. With that in mind—that is, with no adjustments 
based on recent residuals—it is striking how closely these forecasts mimic the behavior of the data 
after this incident. It is also interesting how similar the decay rates (denominator terms) are for the 
two incidents. Two horizontal lines, one at the pre-intervention level exp(0.05) = 1.05 and one at the 
ultimate level exp(0.05 + (2.59—2.48)/( 1—.76)) = exp(0.51) = 1.66, are drawn. 

The permanent effect of the event of September 11 is an increase of (2.59—2.48)/( 1—.76) in log 
transformed volume, according to the model. That becomes a multiplicative increase of 
exp((2.59-2.48)/(l-.76)) = 1.58, a 58% increase in volume. 
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5.1 Regression with Time Series Errors and Unequal 
Variances 


5.1.1 Autoregressive Errors 

SAS PROC AUTOREG provides a tool to fit a regression model with autoregressive time series 
errors. Such a model can be written in two steps. With a response Y t related to a single input X, and 
with an AR(1) error, you can write 

Y^Po+frXf+Z, 

and 

Zf = ctZ f _j + e t 

where I akl for stationarity and e t ~N(0, o“) with obvious extensions to multiple regression and 
AR(p) series. The variance of Z, is o 2 /(I — a 2 ), from which the normal density function of 
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Zj = Yj — P 0 — P x X x can be derived. Furthermore, substitution of Z t = Y t -P 0 —PiX, and its lag into 
e t =Z t — oZ, , shows that 

e t =<Yt “Pb -p,X ( )-afY ( , -|3 0 -fiX, ,) 

and because e ( ~N(0. o“), the noimal density of this expression can also be written down for 
t=2, 3,..., n. Because Z,, e 2 , e i ,...,e n are independent of each other, the product of these n normal 
densities constitutes the so-called joint density function of the Y values. If the observed data values Y 
and X are plugged into this function, the only unknowns remaining in the function are the parameters 

a, p 0 , Pj, and o 2 , and this resulting function L(a, P 0 , P, , a 2 ) is called the “likelihood function,” 
with the values of a, P 0 , f),. and o 2 that maximize it being referred to as “maximum likelihood 
estimates.” This is the best way to estimate the model parameters. Other methods described below 
have evolved as less computationally burdensome approximations. 

From the expression for e t you can see that 

Y, = aY t- 1 + Po(l-«) + Pi( X r -« x i-i) + e t 

This suggests that you could use some form of nonlinear least squares to estimate the coefficients. 

A third, less used approach to estimation of the parameters is much less computer intensive, but it 
does not make quite as efficient use of the data as maximum likelihood. It is called the Cochrane- 
Orcutt method and consists of (1) running a least squares regression of Y on X, (2) fitting an 
autoregressive model to the residuals, and (3) using that model to “filter” the data. Writing, as above, 

Y, = aY,_j + p 0 (1 - a) + Pj (X, - aX H ) + e t 

you observe the equation for a regression of the transformed, or filtered, variable Y ( -a.Y t , on 
transformed variables (1—a) and (X, - a.X, ,). Because e t satisfies the usual regression properties, 
this regression, done with ordinary least squares (OLS), satisfies all of the usual conditions for 
inference, and the resulting estimates of the parameters would be unbiased, with proper standard 
errors and valid t tests being given by the ordinary least squares formulas applied to these 
transformed variables. When a is replaced by an estimate from a model for the residuals, the 
statements above are approximately true. The Cochrane-Orcutt method can be modified to include an 
equation for the first observation as well. The method can be iterated, using the new regression 
estimates to produce new estimates of Z t and hence new estimates of a, etc.; however, the 
simultaneous iteration on all parameters done by maximum likelihood would generally be preferred. 

If the error autocorrelation is ignored and a regression of Y on X is done, the estimated slope and 
intercept will be unbiased and will, under rather general assumptions on X, be consistent—that is, 
they will converge to their true values as the sample size increases. However, the standard errors 
reported in this regression, unlike those for the filtered variable regression, will be wrong, as will the 
p-values and any inference you do with them. Thus ordinary least squares residuals can be used to 
estimate the error autocorrelation structure, but the OLS t tests and associated p-values for the 
intercept and slope(s) cannot be trusted. In PROC AUTOREG, the user sees the initial OLS 
regression, the estimated autocorrelation function computed from the residuals, the autoregressive 
parameter estimates (with insignificant ones being omitted if BACKSTEP is specified), and then the 
final estimation of parameters including standard errors and tests that are valid based on large sample 
theory. 



Chapter 5: The ARIMA Model: Special Applications 241 


5.1.2 Example: Energy Demand at a University 

Output 5.1 shows energy demand plotted against temperature and against date. Data were collected 
at North Carolina State University during the 1979-1980 academic year. Three plot symbols are used 
to indicate non-workdays (*), workdays with no classes (dots), and teaching days (+). The goal is to 
relate demand for energy to temperature and type of day. The coefficient of variable WORK will be 
seen to be 2919. The variable WORK is 0 for non-workdays and 1 for workdays, indicating that 
1(2919) is to be added to every prediction for a workday. A similar 0-1 variable called TEACH has 
coefficient 1011, which indicates that 1(1011) should be added to teaching days. Since all teaching 
days are workdays, teaching day demand is 2919+1011 = 3930 higher than non-workdays for any 
given temperature. Workdays that are not teaching days have demand 2919 higher than non¬ 
workdays for a given temperature. As temperatures rise, demand increases at an increasing rate. The 
three curves on the graph come from a model to be discussed. The plot of demand against date shows 
that there were a couple of workdays during class break periods, e.g., December 31, where demand 
was more like that for non-workdays, as you might expect. You might want to group these with the 
non-workdays. Also, day-of-week dummy variables can be added. A model without these 
modifications will be used. 


Output 5.1 

NCSU Energy> 
Demand 


NCSU ENERGY DEMAND vs TEMPERATURE 

*= OFF = WORK, NO CLASSES +=TF.ACH 

DEMAND 
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Output 5.1 

NCSU Energy 

Demand 

(continued) 


NCSU ENERGY DEMAND vs DATE 

*= OFF + = TKACH 


DEMAND 



The model illustrated here has today's temperature TEMP, its square TEMPS Q, yesterday's 
temperature TEMPI, teaching day indicator TEACH, and workday indicator WORK as explanatory 
variables. Future values of TEACH and WORK would be known, but future values of the 
temperature variables would have to be estimated in order to forecast energy demand into the future. 
Future values of such inputs need to be provided (along with missing values for the response) in the 
data set in order to forecast. No accounting for forecast inaccuracy in future values of the inputs is 
done by PROC AUTOREG. 

To fit the model issue this code. 

PROC AUTOREG DATA=ENERGY; 

MODEL DEMAND = TEMP TEMPSQ TEACH WORK TEMPI 
/ NLAG= 15 BACKSTEP METHOD=ML DWPROB; 

RUN; 

Output 5.2 contains the ordinary least squares regression portion of the PROC AUTOREG output. 
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Output 5.2 OLS Regression 


The AUTOREG Procedure 


Dependent Variable DEMAND 
Ordinary Least Squares Estimates 


SSE 

231077196 

DFE 

359 

MSE 

643669 

Root MSE 

802.28989 

SBC 

5947.02775 

AIC 

5923.62837 

Regress R-Square 

0.8794 

Total R-Square 

0.8794 

Durbin-Watson 

0.5331 

Pr < DW 

<.0001 

Pr > DW 

1.0000 




NOTE: Pr<DW is the p-value for testing positive autocorrelation, 
and Pr>DW is the p-value for testing negative autocorrelation. 







Standard 


Approx 

Variable 

DF 

Estimate 

Error t Value Pr > |t| 

Intercept 

1 

3593 

202.8272 

17.71 

<.0001 

TEMP 


1 

27.3512 

5.7938 

4.72 

<.0001 

TEMPSQ 


1 

0.7988 

0.1458 

5.48 

<.0001 

TEACH 


1 

1533 

150.2740 

10.20 

<.0001 

WORK 


1 

2685 

158.8292 

16.91 

<.0001 

TEMPI 


1 

34.0833 

5.7923 

5.88 

<.0001 




Estimates 

of 

Autocorrelations 


Lag Covariance 

Correlation 

1 

987654321 

0 12 3 4 

5 6 7 8 

0 

633088 


1.000000 



****************** 

1 

463306 


0.731819 



*************** 

2 

366344 


0.578662 



************ 

3 

312670 


0.493881 



********** 


4 

291068 


0.459760 



********* 


5 

311581 


0.492161 



********** 


6 

335580 


0.530069 



*********** 


7 

352673 


0.557068 



*********** 


8 

302677 


0.478096 



********** 


9 

273294 


0.431684 



********* 


10 

232510 


0.367262 



******* 


11 

203572 


0.321555 



****** 


12 

183762 


0.290263 



****** 


13 

205684 


0.324889 



****** 


14 

244380 


0.386012 



******** 


15 

223499 


0.353031 



******* 



This regression displays strongly autocorrelated residuals r t . The Durbin-Watson statistic, 

n n 

DW=y>, -r t ] )~/yV~ = 0.5331, is significantly less than 2 (p<0.0001) indicating nonzero lag 1 

2 1 

autocorrelation in the errors. If /' and r t I were alike (strong positive correlation), then /' —r t , would 
be near 0, thus showing that the DW statistic tends to be less than 2 under positive autocorrelation. 
The DW is expected to be near 2 for uncorrelated data. Extensions of DW to lags of more than 1 are 
available in PROC AUTOREG. The autocorrelation plot shows strong autocorrelations. The 
correction for autocorrelation reveals that lags 7 and 14 are present, indicating some sort of weekly 
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effect. Lag 1 is also sensible, but lags 5 and 12 are a little harder to justify with intuition. All others 
of the 15 lags you started with are eliminated automatically by the BACKSTEP option. 


Output 5.3 

Autoregressive 

Parameter 

Estimates 


Estimates of Autoregressive Parameters 




Standard 


Lag 

Coefficient 

Error 

t Value 

1 

-0.580119 

0.040821 

-14.21 

5 

-0.154947 

0.045799 

-3.38 

7 

-0.157783 

0.048440 

-3.26 

12 

0.127772 

0.045687 

2.80 

14 

-0.116690 

0.044817 

-2.60 


In PROC AUTOREG, the model for the error Z t is written with plus rather than minus signs—that 
is, (1 + oijB + ot,B 2 +... + a p B p )Z t = e t . Therefore the AR(14) error model in Output 5.3 is 

Z t = 0.58 Z t _ x + 0.15 Z t _ 5 + 0.16 Z t _ 7 - 0.13 Z t _ r + 0.12 Z t _ 14 + e t 


Using this AR(14) structure and these estimates as initial values, the likelihood function is computed 
and maximized, producing the final estimates with correct (justified by large sample theory) standard 
errors in Output 5.4. 


Output 5.4 

Final 

Estimates 





Standard 


Approx 

Variable 

DF 

Estimate 

Error 

t Value 

Pr > |t| 

Intercept 

1 

4638 

407.4300 

11.38 

<.0001 

TEMP 

1 

23.4711 

3.5389 

6.63 

<.0001 

TEMPSQ 

1 

0.7405 

0.1162 

6.37 

<.0001 

TEACH 

1 

1011 

114.2874 

8.84 

<.0001 

WORK 

1 

2919 

115.5118 

25.27 

<.0001 

TEMPI 

1 

25.0550 

3.5831 

6.99 

<.0001 

AR1 

1 

-0.6490 

0.0401 

-16.18 

<.0001 

AR5 

1 

-0.1418 

0.0478 

-2.97 

0.0032 

AR7 

1 

-0.1318 

0.0504 

-2.62 

0.0093 

AR12 

1 

0.1420 

0.0481 

2.95 

0.0033 

ARM 

1 

-0.1145 

0.0469 

-2.44 

0.0151 


Using OLS from Output 5.2 or from PROC REG, a 95% confidence interval for the effect, on 
energy demand, of teaching classes would have been incorrectly computed as 1533 ± 1.96(150), 
whereas the correct interval is 1011 ± 1.96(114). 

Using the same regression inputs in PROC ARIMA, a model with p=(l), q=( 1,7,14) showed no lack 
of fit in Output 5.5. This error model is a bit more aesthetically pleasing than that of AUTOREG, as 
it does not include the unusual lags 5 and 12. Note that AUTOREG cannot fit moving average terms. 
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Output 5.5 PROC ARIMA for Energy Data 




The ARIMA Procedure 






Maximum Likelihood Estimation 






Standard 


Approx 




Parameter Estimate 

Error t 

Value 

Pr > |t| 

Lag 

Variable 

Shift 

MU 

4468.1 

363.30832 

12.30 

<.0001 

0 

DEMAND 

0 

MAI ,1 

0.25723 

0.06257 

4.11 

<.0001 

1 

DEMAND 

0 

MAI ,2 

-0.19657 

0.05391 

-3.65 

0.0003 

7 

DEMAND 

0 

MAI ,3 

-0.18440 

0.05289 

-3.49 

0.0005 

14 

DEMAND 

0 

AR1 ,1 

0.84729 

0.03622 

23.39 

<.0001 

1 

DEMAND 

0 

NUM1 

25.49771 

3.45642 

7.38 

<.0001 

0 

TEMP 

0 

NUM2 

0.74724 

0.11173 

6.69 

<.0001 

0 

TEMPSQ 

0 

NUM3 

838.08859 

111.69438 

7.50 

<.0001 

0 

TEACH 

0 

NUM4 

3085.4 

114.90204 

26.85 

<.0001 

0 

WORK 

0 

NUM5 

25.32913 

3.44973 

7.34 

<.0001 

0 

TEMPI 

0 


Autocorrelation 

Check of Residuals 



To 

Chi- 

Pr > 






Lag 

Square DF ChiSq . 


-Autocorrelations. 


6 

3.85 2 0 

.1461 0.018 

-0.007 

-0.042 

-0.052 

0.022 

0.071 

12 

11.04 8 0 

.1995 -0.000 

0.008 

0.077 

0.040 

-0.036 

-0.101 

18 

14.65 14 0 

.4022 -0.007 

0.010 

-0.019 

0.084 

-0.014 

-0.042 

24 

15.71 20 0 

.7347 -0.003 

0.011 

0.049 

-0.002 

0.013 

-0.005 

30 

20.87 26 0 

.7487 0.002 

0.074 

-0.073 

0.043 

0.007 

0.012 

36 

28.24 32 0 

.6575 0.021 

-0.017 

0.075 

0.002 

0.106 

-0.025 

42 

36.98 38 0 

.5163 0.052 

0.015 

0.050 

0.018 

0.042 

0.117 

48 

38.90 44 0 

.6895 -0.014 

0.046 

-0.033 

-0.021 

-0.022 

-0.016 


The effect on energy demand of teaching classes is estimated from PROC ARIMA as 838 with 
standard error 112, somewhat different from PROC AUTOREG and quite different from the OLS 
estimates. The purely autoregressive model from PROC AUTOREG and the mixed ARMA error 
model can both be estimated in PROC ARIMA. Doing so (not shown here) will show the AIC and 
SBC criteria to be smaller (better) for the model with the mixed ARMA error. The chi-square white 
noise tests, while acceptable in both, have higher (better) p-values for the mixed ARMA error 
structure. 


5.1.3 Unequal Variances 

The models discussed thus far involve white noise innovations, or shocks, that are assumed to have 
constant variance. For long data sets, it can be quite apparent just from a graph that this constant 
variance assumption is unreasonable. PROC AUTOREG provides methods for handling such 
situations. In Output 5.6 you see graphs of 8892 daily values (from January 1, 1920 with Y, = 

108.76 to December 31, 1949 with Y SS , P =200.13) of Y, = the Dow Jones Industrial Average, 
b t =log( Y ( ), and D ( =log( Y ( ) -log( Y ( ,). Clearly the log transformation improves the statistical 
properties and gives a clearer idea of the long-term increase than does the untransformed series. 
Many macroeconomic time series are better understood on the logarithmic scale over long periods of 
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time. By the properties of logarithms, note that D ( = log( Y, / Y ( _, ), and if Y ( / Y ( _, is near 1 then D ( 
is approximately ( Y t —Y M )/Y M ; that is, 100 D ( represents the daily percentage change in the Dow 
Jones average. 

To demonstrate how this works, let A = ((Y, - Y ( , )/Y ( ,) so 1 + A = Y ( /Y ( ,. Using a Taylor series 
expansion of log(X) at X=l, you can represent log(l + A) = log(l) + l” 1 A - (l” 2 /2 )A 2 +... since 
(<3/<3X)log(X) = 1/X , (cVSX)(l/X) = (- 1/(X*X)), and so on. Since log(l) = 0, the 
log( 1 + A) = log(Y ( /Y, !) can be approximated by 0 + A = (Y, - Y ( , )/Y ( ,. This also shows that 
log(Y ( /Y ( _!) is essentially the overnight return on a $1 investment. 

The graph of D ( shows some periods of high volatility. The five vertical graph lines represent, from 
left to right. Black Thursday (October 24, 1929, when the stock market crashed), the inauguration of 
President Franklin D. Roosevelt (FDR), the start of World War II, the bombing of Pearl Harbor, and 
the end of World War 11. Note especially the era from Black Thursday until a bit after FDR assumed 
office, known as the Great Depression. 

n 

The mean of the D ( values is D =n 1 ^ (log(Y,) - log(Y, ,)) = « 1 (log(Y H ) - log(Y,)) so that 

t= 2 

e nD = Y n /Yj, the increase in the series over the entire time period. For the data at hand, the ratio of 
the last to first data point is 200.13/108.76 = 1.84, so the series did not quite double over this 30-year 
period. You might argue that subperiods like the depression in which extreme volatility is present are 
not typical and should be ignored or at least downweighted in computing a rate of return that has 
some relevance for future periods. You decide to take a look at the variability of the D ( series. 

Because there is so much data, the reduction of each month's D ( numbers to a standard deviation 
still leaves a relatively long time series of 360 monthly numbers. These standard deviations have a 
histogram with a long tail to the right. Again a logarithmic transform is used to produce a monthly 
series S t =log(standard deviation) that has a more symmetric distribution. Thus S, measures the 
volatility in the series, and a plot of S t versus time is the fourth graph in Output 5.6. 
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Output 5.6 Dow Jones Industrial Average on Several Scales 


DOW JONES 

ORIGINAL. LOG. DIFFERENCED LOG 

REFERENCE LINES: BLACK THURSDAY] FDR ASSUMES OFFICE, WWII STARTS, PEARL HARBOR, END WWII 


DOW JONES 


LOG SCALE 


Dow 



DIFFERENCED LOGS 

Dif 
0.16 
0.14 
0.12 
0.10 
0.08 
0.06 
0.04 
0.02 
0.00 
— 0.02 
—0.04 
—0.06 
—0.08 
— 0.10 
— 0.12 
—0.14 


0T0V20 0V0V30 0 V01/40 01/0V50 


Log 



date 


VOLATILITY = log(std) 

Istd 




date 


date 
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Now apply a time series model to the S t series. The tau test for stationarity suggests a unit root 
process when six augmenting lags are used. The reason for choosing six lags is that the partial 
autocorrelation function for S t is near 0 after lag 6 and, furthermore, a regression of S, - S, , on 
S, ! and 20 lagged differences (S, . - .S', . , for j- 1 to 20) in PROC REG gave an insignificant F test 

for lags 7 through 20. A similar regression using six lagged differences showed all six to be 
significant according to their t tests. Dickey and Fuller show that such t tests on lagged differences 
are valid in large samples—only the test for the coefficient on the lagged level S, , has a 
nonstandard distribution. That test cannot reject the unit root hypothesis, and so a model in first 
differences is suggested for the log transformed standard deviation series S t . The above results are 
not displayed. At this point you are ready to model S,. You have seen that a lag 6 autoregressive 
model for S t - S, , seems to provide an adequate fit. Perhaps this long autoregression is an 
approximation of a mixed model. The following code, using FSTD as the variable name for S t , 
seems to provide a reasonable ARMA( 1,1) model: 

PROC ARIMA DATA=OUTl; 

I VAR=LSTD(1) STATIONARITY=(ADF=(6)); 

E P=1 Q=1 ML NOCONSTANT; 

RUN; 


The constant was suppressed (NOCONSTANT) after an initial check showed it to be insignificant. 
The tau test for unit roots suggests stationarity of the differenced series (p=0.0001) when six lagged 
differences are used. That is, no further differencing seems to be needed. Said and Dickey (1984) 
show that even for mixed models, these stationarity tests are valid as long as sufficient lagged 
differences are included in the model. In summary, the S series appears to be well modeled as an 
ARIMA( 1,1,1) series with parameters as shown in Output 5.7. 


Output 5 .7 
ARIMA 
Model for S 



Augmented Dickey-Fuller 

Unit Root Tests 



Type 

Lags Rho 

Pr < Rho 

Tau Pr < Tau 

F 

Pr > F 

Zero Mean 

6 322.9722 

0.9999 - 

■11.07 <.0001 



Single Mean 

6 322.2521 

0.9999 - 

■11.06 <.0001 

61.14 

0.0010 

T rend 

6 322.0197 

0.9999 - 

■11.05 <.0001 

61 .05 

0.0010 


Maximum 

Likelihood 

Estimation 





Standard 


Approx 


Parameter 

Estimate 

Error 

t Value Pr > |t| 

Lag 

MAI ,1 

0.82365 

0.04413 

18.66 

<.0001 

1 

AR1 ,1 

0.32328 

0.07338 

4.41 

<.0001 

1 
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Output 5 .7 
ARIMA 
Model for S 
(continued) 


Autocorrelation Check of Residuals 


Autocorrelations 


To 

Chi- 


Pr > 

Lag 

Square 

DF 

ChiSq 

6 

2.07 

4 

0.7230 

12 

7.95 

10 

0.6341 

18 

19.92 

16 

0.2236 

24 

23.96 

22 

0.3494 

30 

34.37 

28 

0.1890 

36 

37.38 

34 

0.3164 

42 

38.83 

40 

0.5227 

48 

41 .69 

46 

0.6533 


0 

.010 

-0 

.016 

-0 

.045 

0 

.072 

0 

.060 

-0 

.018 

-0 

.013 

-0 

.014 

-0 

.134 

0 

.053 

-0 

.022 

-0 

.080 

-0 

.045 

0 

.016 

-0 

.131 

0 

.014 

0 

.027 

0 

.056 

-0 

.018 

0 

.005 

-0 

.050 

-0 

.015 

-0 

.010 

-0 

.028 


0 

.053 

0 

.015 

0 

.016 

0 

.044 

-0 

.005 

0 

.070 

0 

.065 

0 

.078 

0 

.055 

0 

.018 

0 

.015 

0 

.017 

0 

.081 

-0 

.002 

-0 

.024 

0 

.010 

-0 

.057 

0 

.010 

0 

.016 

-0 

.007 

-0 

.022 

0 

.070 

-0 

.004 

-0 

.030 


The model suggests the predicting equation 

S, = S,_! + 0.3233(S,_j - S,_ 2 ) - .82365e,_! 

where e t _ x would be replaced by the residual S M - S, ,. Exponentiation of S, gives a conditioned 
standard deviation for month t. Notice that because S, is a logarithm, the resulting standard 

deviations will all be positive regardless of the sign of S t . This allows the variance to change over 
time in a way that can be predicted from the most recent few variances. The theory underlying 
ARIMA models is based on large sample arguments and does not require normality, so the use of log 
transformed standard deviations as data does not necessarily invalidate this approach. However, there 
are at least two major problems with approaching heterogeneous variation in the manner just used 
with the Dow Jones series. First, you will not often have so much data to start with, and second, the 
use of a month as a period for computing a standard deviation is quite arbitrary. A more statistically 
rigorous approach is now presented. The discussion thus far has been presented as a review of unit 
root test methodology as well as a motivation for fitting a nonconstant variance model that might 
involve a unit root. An analyst likely would use the more sophisticated approach shown in the next 
section. 


5.1.4 ARCH, GARCH, and IGARCH for Unequal Variances 

The series D ( whose variability is measured by S, has nonconstant conditional variance. Engle 
(1982) introduced a model in which the variance at time t is modeled as a linear combination of past 
squared residuals and called it an ARCH (autoregressive conditionally heteroscedastic) process. 
Bolerslev (1986) introduced a more general structure in which the variance model looks more like an 
ARMA than an AR and called this a GARCH (generalized ARCH) process. Thus the usual approach 
to modeling ARCH or GARCH processes improves on the method just shown in substantial ways. 
The puipose of the monthly standard deviation approach was to illustrate the idea of an ARMA type 
of structure for standard deviations or variances. 
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The usual approach to GARCH(p.q) models is to model an error term e, in terms of a standard white 
noise e t ~ N(0,1) as 8 t = - s jh^e t where h t satisfies the type of recursion used in an ARMA model: 

q p 

h t = co + £<VW j +£yA-./ 

'=i ;=i 

In this way, the error term has a conditional variance that is a function of the magnitudes of past 
errors. Engle's original ARCH structure has y . =0. Because h t is the variance rather than its 

logarithm, certain restrictions must be placed on the a, , y,. and co to ensure positive variances. For 
example, if these are all restricted to be positive, then positive initial values of h t will ensure all h t 
are positive. For this reason, Nelson (1991) suggested replacing h t with log (/?,) and an additional 
modification; he called the resulting process EGARCH. These approaches allow the standard 
deviation to change with each observation. Nelson and Cao (1992) give constraints on the a and 
y values that ensure nonnegative estimates of h t . These are the default in PROC AUTOREG. More 
details are given in the PROC AUTOREG documentation and in Hamilton (1994), which is a quite 
detailed reference for time series. 

Recall that PROC AUTOREG will fit a regression model with autoregressive errors using the 
maximum likelihood method based on a normal distribution. In place of the white noise shocks in the 
autoregressive error model you can specify a GARCH(p.q) process. If it appears, as suggested by 
your analysis of the Dow Jones standard deviations, that the process describing the error variances is 
a unit root process, then the resulting model is referred to as integrated GARCH or IGARCH. If the 
usual stationarity conditions are satisfied, then for a GARCH process, forecasts of h t will revert to a 
long-run mean. In an IGARCH model, mean reversion is no longer a property of h t , so forecasts of 
h t will tend to reflect the most recent variation rather than the average historical variation. You 
would expect the variation during the Great Depression to have little effect on future h t values in an 
IGARCH model of the Dow Jones data. 

To investigate models of the daily percentage change in the Dow Jones Industrial Average Y ( . you 
will use D ( = log(Y,) - log(Y, ,). Calling this variable DDOW, you issue this code: 

PROC AUTOREG DATA=MORE; 

MODEL DDOW = / NLAG=2 

GARCH= (P=2 ,Q=1,TYPE=INTEG,NOINT); 

OUTPUT OUT=OUT2 HT=HT P=F LCLI=L UCLI=U; 

RUN; 


PROC AUTOREG allows the use of regression inputs; however, here there is no apparent time trend 
or seasonality and no other regressors are readily available. The model statement DDOW = (with no 
inputs) specifies that the regression part of your model is only a mean. Note the way in which the h t 
sequence, predicted values, and default upper and lower forecast limits have been requested in the 
data set called OUT2. 
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In Output 5.8, the estimate of the mean is seen to be 0.000363. Since DDOW is a difference, a mean 
is interpreted as a drift in the data, and since the data are log differences, the number e u uu ' b63 = 

1.0003631 is an estimate of the long-run daily growth over this time period. With 8892 days in the 
study period, the number g auiul - b63 X 8891 ) = 25 represents a 25-fold increase, roughly an 11.3% yearly 
growth rate! This is not remotely like the rate of growth seen, except in certain portions of the graph. 
PROC AUTOREG starts with OLS estimates so that the average DDOW over the period is the OLS 

intercept 0.0000702 from Output 5.8. This gives e (u uuuu7u2)(8891) = 1.87, indicating 87% growth for 
the full 30-year period. This has to be more in line with the graph because, as you saw earlier, except 
for rounding error it is Y n /Yj. 

Note also the strong rejection of normality. The normality test used here is the that of Jarque and 
Bera (1980). This is a general test of normality based on a measurement of skewness b x and one of 
kurtosis h n - 3 using residuals r t , where 

2/, 4 /h 

b x = —^- and b n - 3 = —^-3 

(j£r?/r,f 2 

t =1 t=\ 


The expression ^ x J t In is sometimes called the (raw) /th moment of r. The fractions involve third 

r=i 

and fourth moments scaled by the sample variance. The numerators are sums of approximately 
independent terms and thus satisfy a central limit theorem. Both have, approximately, mean 0 when 
the true errors are normally distributed. Approximate variances of the skewness and kurtosis are 6 In 
and 24/n. Odd and even powers of normal errors are uncorrelated, so squaring each of these 
approximately normal variates and dividing by its variance produces a pair of squares of 
approximately independent N(0,1) variates. The sum of these squared variates, therefore, follows a 
chi-square distribution with two degrees of freedom under the normality null hypothesis. The Jarque- 
Bera test 

JB = n(b{/6 + {b 2 -3) 2 / 24) 

has (approximately) a chi-square distribution with two degrees of freedom under the null hypothesis. 

Why is the IGARCH model giving a 25-fold increase? It seems unreasonable. The model indicates, 
and the data display, large variability during periods when there were steep drops in the Dow Jones 
average. A method that accounts for different variances tends to downweight observations with high 
variability. In fact there are some periods in which the 11.3% annual rate required for a 25-fold 

increase (1.113 20 =25) was actually exceeded, such as in the periods leading up to the Great 
Depression, after FDR assumed office, and toward the end of WWII. The extremely large variances 
associated with periods of decrease or slow growth give them low weight, and that would tend to 
increase the estimated growth rate, but it is still not quite enough to explain the results. 
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Output 5.8 
IGARCH 
Model for 
Dow Jones 




The AUTOREG 

Procedure 





Dependent Variable ddow 




Ordinary Least Squares Estimates 



SSE 


1.542981 

DFE 


8891 

MSE 


0.0001735 

Root MSE 

0 

.01317 

SBC 


-51754.031 

AIC 

-51761 .124 

Regress 

R-Square 

0.0000 

Total R-Square 

0.0000 

Durbin-Watson 

1 .9427 







Standard 


Approx 

Variable 

DF 

Estimate 

Error t 

Value 

Pr > |t| 

Intercept 

1 

0.0000702 

0.000140 

0.50 

0.6155 


Estimates of Autoregressive Parameters 





Standard 




Lag 

Coefficient 

Error 

t Value 



1 

-0.029621 

0.010599 

-2.79 



2 

0.037124 

0.010599 

3.50 


Algorithm 

converged. 






Integrated GARCH Estimates 



SSE 


1.54537859 

Observations 


8892 

MSE 


0.0001738 

Uncond Var 



Log Likelihood 

28466.2335 

Total R-Square 



SBC 


-56887.003 

AIC 

-56922.467 

Normality Test 

3886.0299 

Pr > ChiSq 

< 

.0001 




Standard 


Approx 

Variable 

DF 

Estimate 

Error t 

Value 

Pr > |t| 

Intercept 

1 

0.000363 

0.0000748 

4.85 

<.0001 

AR1 

1 

-0.0868 

0.009731 

-8.92 

<.0001 

AR2 

1 

0.0323 

0.009576 

3.37 

0.0008 

ARCH1 

1 

0.0698 

0.003963 

17.60 

<.0001 

GARCH1 

1 

0.7078 

0.0609 

11.63 

<.0001 

GARCH2 

1 

0.2224 

0.0573 

3.88 

0.0001 
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Perhaps more importantly, the rejection of normality by the Jarque-Bera test introduces the 
possibility of bias in the estimated mean. In an ordinary least squares (OLS) regression of a column 
Y of responses in a matrix X of explanatory variables, the model is Y=XP +e and the estimated 

parameter vector P = (X'X )” 1 (X'Y) = P + (X'X )” 1 (X'e) is unbiased whenever the random vector e 
has mean 0. In regard to bias, it does not matter if the variances are unequal or even if there is 
correlation among the errors. These features only affect the variance of the estimates, causing biases 

in the standard errors for P but not in the estimates of P themselves. In contrast to OLS, GARCH 
and IGARCH models are fit by maximum likelihood assuming a normal distribution. Failure to meet 
this assumption could produce bias in parameter estimates such as the estimated mean. 

As a check to see if bias can be induced by nonnormal errors, data from a model having the same h t 
sequence as that estimated for the Dow Jones log differences data were generated for innovations 
e t ~ N(0,1) and again for innovations (e, 2 -1)1 \f2, so this second set of innovations used the same 
normal variables in a way that gave a skewed distribution still having mean 0 and variance 1. The 
mean was set at 0.00007 for the simulation and 50 such data sets were created. For each data set, 
IGARCH models were fit for each of the two generated series and the estimated means were output 
to a data set. The overall mean and standard deviation of each set of 50 means were as follows: 



Mean 

Standard Deviation 

Normal Errors 

0.000071 

0.0000907 

Skewed Errors 

0.000358 

0.0001496 


Thus it seems that finding a factor of 5 bias in the estimate of the mean (of the differenced logs) 
could be simply a result of error skewness; in fact the factor of 5 is almost exactly what the 
simulation shows. The simulation results also show that if the errors had been normal, good estimates 
of the true value, known in the simulation to be 0.00007, would have resulted. 

Using type=integ specifies an IGARCH model for h t , which, like any unit root model, will have a 
linearly changing forecast if an intercept is present. You thus use NOINT to suppress the intercept. 
Using p=2 and q=l, your /? f model has the form 

h t = h t _ x + 0.7078(fy_j - h t _^) + 0.2224(fy_T -h t _,) + 0.06988^ 

You can look at h t as a smoothed local estimate of the variance, computed by adding to the previous 
smoothed value ( h t _ x ) a weighted average of the two most recent changes in these smoothed values 
and the square of the most recent shock. 

By default, PROC AUTOREG uses a constant variance to compute prediction limits; however, you 
can output the h t values in a data set as shown and then, recalling that h t is a local variance, add and 

subtract t n 975 yj,~h t from your forecast to produce forecast intervals that incorporate the changing 
variance. Both kinds of prediction intervals are shown in Output 5.9, where the more or less 
horizontal bands are the AUTOREG defaults and the bands based on h t form what looks like a border 
to the data. The data set MORE used for Output 5.8 and Output 5.9 has the historical data and 500 
additional days with dates but no values of D ( . PROC AUTOREG will produce h t values and 
default prediction limits for these. In general, future values of all inputs need to be included for this 
to work, but here the only input is the intercept. 
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Output 5.9 

Default and 
h -Based 
Intervals 


DIFFERENCED LOGS 


ARCHU 



The default prediction intervals completely miss the local features of the data and come off the end of 
the data with a fairly wide spread. Since the last few data periods were relatively stable, the h t -based 

intervals are appropriately narrower. It appears that ( h t - h, ,), (/?, , - /?, ,), and e, 2 were fairly small 
at the end of the series, contributing very little to h M so that h n+l is approximately h n . as are all 
h tH . for />(). The forecast intervals coming off the end of the series thus have about the same width 

as the last forecast interval in the historical data. They are almost, but not exactly, two horizontal 
lines. 

The autoregressive error model is seen to be 

Z, = 0.0868Z,_! - 0.0323Z r _ 2 + s 2 

where e, = ^[fe l . Although the lag Z coefficients are statistically significant, they are small, so their 

contribution to forecasts and to the width of prediction intervals into the future is imperceptible in the 
graph. 

Clearly the IGARCH estimated mean 0.000363 is unacceptable in light of the nonnormality, the 
resulting danger of bias, and its failure to represent the observed growth over the period. The 
ordinary mean 0.00007 is an unbiased estimate and exactly reproduces the observed growth. The 
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usual conditions leading to the (OLS) formula for the standard error of a mean do not hold here, but 
more will be said about this shortly. The problem is not with IGARCH versus GARCH; in fact a 
GARCH(2,1) model also fits the series quite nicely but still gives an unacceptable estimate of the 

n 

mean of D ( . Note that the average of n independent values of e, has variance ti~ 2 ^ h t if e t has 

r=i 

mean 0 and variance 1. The AR(2) error series 


— cqZ^j + a 2 Z t _ 2 + s t 


can be summed from 1 to n on both sides and divided by n to get (1 — a, — a, )Z approximately 
equal to s. From e, = it follows that the (approximate) variance of (1 — a, - a, )Z is 

n n 

n 2 ^_h t and that of Z is thus (1 - - a n )~ 2 ti^ 2 '^ j h r Hamilton (1994, p. 663) indicates that 

r=i r=i 

maximum likelihood estimates of h t are reasonable under rather mild assumptions for ARCH models 
even when the errors are not normal. Also the graphical evidence indicates that the estimated h t 
series has captured the variability in the data nicely. Proceeding on that basis, you sum the estimated 
h t series and use estimated autoregressive coefficients to estimate the standard deviation of the mean 



In this way you get 


0.00007 

0.0001485 


which is not significant at any reasonable level. 

Interestingly, and despite the comments above, a simple t test on the D ( data, ignoring all of the 
variance structure, gives about the same t. A little thought shows that this could be anticipated for the 
special case of this model. The summing of h t and division by n yields what might be thought of as 
an average variance over the period. Because the a s are small here, the average of h t divided by n is 
a reasonable approximation of the variance of Z and thus of D. To the extent that the squared 
residuals (D, - D) 2 provide approximate estimates of the corresponding conditional variances /?,. 

1 ” i - " 

the usual OLS formula, - D) 2 l(n - 1), gives an estimate of the standard error of the 

V t=i 

mean. Additional care would be required, such as consideration of the assumed unit root structure for 
h t and the error introduced by ignoring the a s, to make this into a rigorous argument. However, this 
line of reasoning does suggest that the naive t test, produced by PROC MEANS, for example, might 
be reasonable for these particular data. There is no reason to expect the naive approach to work well 
in general. 
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This example serves to illustrate several important points. One is that careful checking of model 
implications against what happens in the data is a crucial component of proper analysis. This would 
typically involve some graphics. Another is that failure to meet assumptions is sometimes not so 
important but at other times can render estimates meaningless. Careful thinking and a knowledge of 
statistical principles are crucial here. The naive use of statistical methods without understanding the 
underlying assumptions and limitations can lead to ridiculous claims. Computational software is not 
a replacement for knowledge. 


5.2 Cointegration 


5.2.1 Introduction 

In this section you study a dimension k vector V, of time series. The model V ( = A, V, , + A 2 V ( 2 + e ( 
is called a vector autoregression, a “VAR,” of dimension k and order p = 2 (2 lags). It is assumed 
that e ( has a multivariate normal distribution with k dimensional mean vector 0, a vector of Os, and 
k x k variance matrix E. The /'th element of V ( is the time series Y jt - p ( . so the deviation of each 
series from its mean is expressed by the model as a linear function of previous deviations of all series 
from their means. For example, the upper-left panel of Output 5.10 shows the logarithms of some 
high and low prices for stock of the electronic retailer Amazon.com, extracted by the Internet search 
engine Yahoo! 
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Output 5.10 Amazon.com Data with Cointegrating Plane 
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One way of fitting a vector model is to simply regress each Y () on lags of itself and the other Ys 
thereby getting estimates of row i of the A coefficient matrices. Using just one lag you specify 

PROC REG DATA=AMAZON; 

MODEL HIGH LOW = HIGH1 LOW1; 

RUN; 

where highl and lowl are lagged values of the log transformed high and low prices. The partial 
output Output 5.11 shows the estimates. 


Output 5.11 

PROCREG 
on Amazon, com 
Data 


Dependent Variable: high 




Parameter 

Standard 



Variable 

DF 

Estimate 

Error 

t Value 

Pr > |t| 

Intercept 

1 

0.01573 

0.00730 

2.15 

0.0317 

highl 

1 

0.88411 

0.05922 

14.93 

<.0001 

lowl 

1 

0.11583 

0.05979 

1 .94 

0.0533 

Dependent Variable: 

low 






Parameter 

Standard 



Variable 

DF 

Estimate 

Error 

t Value 

Pr > |t| 

Intercept 

1 

-0.01042 

0.00739 

-1.41 

0.1590 

highl 

1 

0.45231 

0.05990 

7.55 

<.0001 

lowl 

1 

0.54209 

0.06047 

8.96 

<.0001 


The estimated model becomes 


Y-n' 


'.8841 

.1158" 

( Y 

L \,t -1 

C^2t — 1U J 


v 4523 

.5421, 

Y 

V 2 ,*-] 



+ 

(e \ 

c \t 

P 2J 


\ e 2f; 


Recall that in a univariate AR(1) process, Y t = aY t , +e t , the requirement |a| < 1 was imposed so 
that the expression Y t = ^ __a J e t _ 1 for Y t in terms of past shocks e t would “converge”—that is, it 

would have weights on past shocks that decay exponentially as you move further into the past. What 
is the analogous requirement for the vector process Y t = AV ( , + e ( ? The answer lies in the 
“eigenvalues” of the coefficient matrix A. 


5.2.2 Cointegration and Eigenvalues 

Any k x k matrix has k complex numbers called eigenvalues or roots that determine certain 
properties of the matrix. The eigenvalues of A are defined to be the roots of the polynomial \m\ - A|, 

where A is the k x k coefficient matrix, I is a k xk identity, and | | denotes a determinant. For the 
fitted 2x2 matrix above, you find 




Chapter 5: The ARIMA Model: Special Applications 259 


fl 0^ (.8841 .1158^1 

m - = {m -. 884 \)(m~. 5421)-(.4523)(. 1158) 

^0 1 J ^.4523 .5421 J 

which becomes m 2 - (.8841 + .542 l)m + .4269 = (m - .9988 )(m - .4274), so the roots of this matrix 
are the real numbers 0.9988 and 0.42740. A matrix with unique eigenvalues can be expressed as 
A = ZDZ , where D is a matrix with the eigenvalues of A on the main diagonal and 0 everywhere 
else and Z is the matrix of eigenvectors of A. Note that 

A L = (ZDZ ')(ZDZ ')--'(ZDZ ') = ZD Z By the same reasoning as in the univariate case, the 

predicted deviations from the means L steps into the future are V n+L = A L V n , where V B is the last 
observed vector of deviations. For the 2x2 matrix currently under study you have 


.8841 .1158 
.4523 .5421 


ZDZ 


"0.9988 

o i 

J 0.9988 l 

0 , 


z 1 

= z 

T Z 

v 0 

0.4274 J 

0 

0.4274 L J 


so that the elements of A L all converge to 0. 


Output 5.12 

Impulse 
Response , 
Lag 1 Model 
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5.2.3 Impulse Response Function 

To illustrate, Output 5.12 shows a bivariate series with both Y l( and Y 2t being 0 up to time t = 11, 
mimicking constant high and low stock price (log transformed and mean corrected). At time t = 11, 
Y l( is shifted to 1 with Y 2) remaining at 0, thus representing a shock to the high price; that is, 

V n = (1,0)'. From then on, V n+L = A L V n = A L (1,0)'; in other words V 11+L traces out the path that 
would be followed with increasing lead L in absence of further shocks. The sequence so computed is 
called an impulse response function. It is seen that at time t = 12, Y 2 ( responded to the jump in Y : t 

and increased to about 0.45 while Y : t decreased, following the initial jump, to about 0.88. 

Continuing through time, the two series come close together then descend very slowly toward 0. This 
demonstrates the effect of a unit shock to the log of high price. The equilibrium, 0 deviations of both 
series from their mean, is approached slowly due to the eigenvalue 0.9988 being so close to 1. 
Clearly, if it were exactly 1.000, then 1.000 L would not decrease at all and the forecasts would not 
converge to the mean (0). Similarly, any attempt to represent the vector of deviations from the mean 
in terms of an infinite weighted sum of past error vectors will fail (i.e., not converge) if the 
eigenvalues or roots of the coefficient matrix A are one—that is, if A has any unit roots. 

When all the eigenvalues of A are less than 1, we say that the vector autoregressive process of order 
1, or VAR(l), is stationary, following the terminology from univariate processes. When the true A 
has unit roots, nonstandard distributions of estimates will arise just as in the univariate case. Note 
that the largest eigenvalue of the estimated matrix here, p = 0.9988, is uncomfortably close to 1, and 
it would not be at all surprising to find that the true A matrix has a unit root. The roots here are 
analogous to the reciprocals of the roots you found for univariate series, hence the requirement that 
these roots be less than 1, not greater in magnitude. 


5.2.4 Roots in Higher-Order Models 

The requirement that the roots are all less than 1 in magnitude is called the stationarity condition. 
Series satisfying this requirement are said to be stationary, although technically, certain conditions on 
the initial observations are also required to ensure constant mean and covariances that depend only 
on the time separation of observations (this being the mathematical definition of stationarity). 

In higher-order vector processes, it is still the roots of a determinantal equation that determine 
stationarity. In an order 2 VAR, \ t = A, V ; , + A 2 V ( 2 + e ( . the characteristic polynomial is 

|«? 2 I - Aj m - A 2 |, and if all values of m that make this determinant 0 satisfy \m\ < 1 then the vector 

process satisfies the stationarity condition. 

Regressing as above on lag 1 and 2 terms in the Amazon.com high and low price series, this 
estimated model 


V,= 


( 0.98486 0.23545 
0.63107 0.65258 


V M + 


f-0.08173 -0.13927^1 
-0.22514 -0.06414 


is found, where the matrix entries are estimates coming from two PROC REG outputs: 
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Output 5.13 

Process of 
Order 2 




Dependent Variable: high 





Parameter 

Standard 



Variable 

DF 

Estimate 

Error t 

Value 

Pr > |t| 

Intercept 

1 

0.01487 

0.00733 

2.03 

0.0430 

highl 

1 

0.98486 

0.06896 

14.28 

<.0001 

lowl 

1 

0.23545 

0.06814 

3.46 

0.0006 

high2 

1 

-0.08173 

0.07138 

-1.14 

0.2528 

low2 

1 

-0.13927 

0.06614 

-2.11 

0.0357 



Dependent Variable: low 





Parameter 

Standard 



Variable 

DF 

Estimate 

Error t 

Value 

Pr > |t| 

Intercept 

1 

-0.00862 

0.00731 

-1.18 

0.2386 

highl 

1 

0.63107 

0.06877 

9.18 

<.0001 

lowl 

1 

0.65258 

0.06795 

9.60 

<.0001 

high2 

1 

-0.22514 

0.07118 

-3.16 

0.0017 

low2 

1 

-0.06414 

0.06595 

-0.97 

0.3312 


Inclusion of lag 3 terms seems to improve the model even further, but for simplicity of exposition, 
the lag 2 model will be discussed here. Keeping all the coefficient estimates, the characteristic 
equation, whose roots determine stationarity, is 


2 1 

"1 0^ 


"0.98486 

0.23545^ 


"-0.08173 —0.13927 ^ 

m 

,0 1, 

- /w 

v 0.63107 

0.65258 j 


,-0.22514 -0.06414 J 


= {m - 0.99787)(m - 0.54974)(m - 0.26767 ){m +. 17784) 

Note that again, the largest eigenvalue, p = 0.99787, is very close to 1, and it would not be at all 
surprising to find that the characteristic equation |/« 2 I - mA , - A 2 | = 0 using the true coefficient 

matrices has a unit root. Founds and Dickey (1989) show that if a vector AR process has a single unit 
root, then the largest estimated root, normalized as n (p -1), has the same limit distribution as in the 
univariate AR(1) case. Comparing n(p - 1) = 509(0.99787 -1) = -1.08 to the 5% critical value 
-11.3, the unit root hypothesis is not rejected. This provides a test for one versus no unit roots and 
hence is not as general as tests to be discussed later. Also, no diagnostics have been performed to 
check the model adequacy, a prerequisite for validity of any statistical test. 

Using this vector AR(2) model, a bivariate vector of 0 deviations up to time t = 11 is generated, then 
a unit shock is imposed on the first component, the one corresponding to the high price, and the 
AR(2) used to extrapolate into the future. The code is as follows: 
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DATA SHOCK; 

Y12=0; Y22=0; Y11=0; Y21=0; 

DO T=1 TO 100; 

Y1 = .98486*Y11 + ,23545*Y21 - .08173*Y12 - ,13927*Y22; 

Y2 = .63107 *Y11 + .6525 8 *Y21 - .22514*Y12 - .06414*Y22; 

IF T=ll THEN Yl = l; 

OUTPUT; 

Y22=Y21; Y21=Y2; Y12=Y11; Y11=Y1; 

END; 

RUN; 

PROC GPLOT DATA=SHOCK; PLOT (Y1 Y2)*T/OVERLAY HREF=11; 

SYMBOL1 V=DOT I=JOIN C=RED; 

SYMBOL2 V=DOT I=JOIN C=GREEN; 

RUN; 

QUIT; 


The graph of this impulse response function is shown in Output 5.14. 


Output 5.14 

Impulse 
Response , 
Lag 2 Model 
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The addition of the second lag produces a more interesting pattern immediately following the shock 
to the high price logarithm series, but in the long run the series again approach each other and 
descend in tandem to the (0,0) equilibrium deviation from the mean. 

The forecasts might not have returned to the (0,0) equilibrium point if the true coefficient matrices 
rather than estimates had been used. The behavior in the estimated model could simply be the result 
of the highest estimated root 0.99787 being a slight underestimate of a root that is really 1. Notice 
that a number even slightly smaller than 1 will reduce to nearly 0 when raised to a large exponent, as 
happens when the impulse response is extrapolated into the future. Models that allow exact unit roots 
in vector processes will be discussed next. 


5.2.5 Cointegration and Unit Roots 

An interesting class of models with exact unit roots is the class of cointegrated vector processes that 
can be represented in a type of model called the error correction model. Cointegration refers to a case 
in which a vector process, like the one with logarithms of high and low prices currently under 
discussion, has individually nonstationary components but there is some linear combination of them 
that is stationary. To make things a little clearer, suppose it is hypothesized that the ratio of high to 
low prices is stable; specifically, the daily price ratio series log(high/low) = log(high) - log(low) is 
stationary even though the log(high) and log(low) series each have unit roots. In this case, a shock to 
the high price series will result in an impulse response in which both series move as before, but they 
will not move back toward any historical mean values. Rather they will move toward some 
equilibrium pair of values for which log(high) - log(low) equals its long-term mean. 

You can check spread = log(high) - log(low) for stationarity with no new tools—simply create the 
daily spread series and perform a unit root test on it. Here is some code to do the test and to check to 
see if 3 autoregressive lags (and hence 2 lagged differences) are sufficient to reduce the errors to 
white noise. 

PROC ARIMA DATA=AMAZON; 

I VAR=SPREAD STATIONARITY = (ADF=(2)); 

E P = 3 ; 

RUN; 

As shown in Output 5.15, the tests strongly reject the unit root null hypothesis and thus indicate 
stationarity. The zero mean test would be useful only if one is willing to assume a zero mean for 
log(high) - log(low), and since high > low always, such an assumption is untenable for these data. 
Also shown are the chi-square tests for a lag 3 autoregression. They indicate that lagged differences 
beyond the second, Y ( 2 - Y ( 3 . are unnecessary and the fit appears to be excellent. This also 
suggests that an increase in the bivariate system to 3 lags might be helpful, as has previously been 
mentioned. 
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Output 5.15 

Stationary Test 
for High-Low 
Spread 





Augmented Dickey-Fuller Unit 

Root Tests 


Type 


Lags 

Rho Pr < Rho 

Tau 1 

3 r < Tau 

F 

Pr > F 

Zero 

Mean 

2 

-18.3544 0.0026 

-3.00 

0.0028 



Single Mean 2 

-133.290 0.0001 

-7.65 

<.0001 

29.24 

0.0010 

T rend 

2 

-149.588 0.0001 

-8.05 

<.0001 

32.41 

0.0010 



Conditional Least Squares Estimation 






Standard 


Approx 


Parameter 

Estimate Error 

t Value 

Pr > 

It | 

Lag 

MU 


0 

.07652 0.0043870 

17.44 

<.0001 

0 

AR1 

,1 

0 

.38917 0.04370 

8.91 

<.0001 

1 

AR1 

,2 

0 

.04592 0.04702 

0.98 

0.3293 

2 

AR1 

,3 

0 

.18888 0.04378 

4.31 

<.0001 

3 




Autocorrelation Check of 1 

Residuals 



To 

Chi- 


Pr > 





Lag Square 

DF 

ChiSq . 

■Autocorrelations 



6 

4.63 

3 

0.2013 -0.001 -0.018 

-0.009 

-0.047 

0.072 

-0.033 

12 

9.43 

9 

0.3988 0.037 0.041 

0.035 

0.025 

0.029 

0.058 

18 

12.71 

15 

0.6248 -0.018 -0.046 

0.025 

-0.014 

0.016 

0.052 

24 

21.14 

21 

0.4506 0.017 0.023 

-0.067 

-0.074 

0.059 

0.038 

30 

25.13 

27 

0.5669 0.026 0.049 

0.014 

-0.012 

-0.006 

0.063 

36 

28.86 

33 

0.6734 0.013 0.038 

0.049 

-0.023 

0.016 

-0.045 

42 

33.05 

39 

0.7372 0.049 0.055 

0.023 

0.010 

0.039 

0.003 

48 

36.51 

45 

0.8125 0.030 -0.035 

-0.050 

-0.038 

0.006 

-0.004 


It appears that spread = log(high) - log(low) is stationary according to the unit roots tests. That 
means standard distribution theory should provide accurate tests since the sample size n = 502 is not 
too small. In that light, notice that the mean estimate 0.07652 for spread is significantly different 
from 0. An estimate of the number toward which the ratio of high to low prices tends to return is 
07652 _ i Qg w j |-) 1 a 95 % confidence interval extending from g llll7652 - a 961(11111143871 = ] 07 to 
e oi)7652+(i.96 h0.004387) _ | 99 You conclude that the high tends to be 7 % to 9 % higher than the low in the 
long run. 
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You see that testing for cointegration is easy if you can prespecify the linear combination—e.g., 

S t = spread = log(high) - log(low). Often one only suspects that some linear combination Y l( - pY 2) 
is stationary, where (Y u ,Y 2t ) is a bivariate time series, so the problem involves estimating p as well 
as testing the resulting linear combination for stationarity. Engle and Granger (1987) argue that if 
you use regression to estimate p, your method is somewhat like sorting through all linear 
combinations of log(high) and log(low) to find the most stationary-looking linear combination. 
Therefore if you use the standard critical values for this test as though you knew P from some 
external source, your nominal level 0.05 would understate the true probability of falsely rejecting the 
unit root null hypothesis. Their solution was to compute residuals r t = Y l( - pY 2) from a least squares 
regression of Y l( on Y 2) and run a unit root test on these residuals, but then to compare the test 
statistic to special critical values that they supplied. This is a relatively easy and intuitively pleasing 
approach; however, it is not clear which of two or more series to use as the dependent variable in 
such a regression. 

More symmetric approaches were suggested by Stock and Watson (1988) and Johansen 
(1988, 1991). Stock and Watson base their approach on a principal components decomposition of the 
vector time series, and Johansen’s method involves calculating standard quantities, canonical 
correlations, from a multivariate multiple regression and then figuring out what distributions these 
would have in the vector time series case with multiple unit roots. Both strategies allow testing for 
multiple unit roots. For further comparisons among these approaches and an application to a 
macroeconomic vector series, see Dickey, Janssen, and Thornton (1991). 


5.2.6 An Illustrative Example 

To get a little better feeling for cointegration, consider this system with known coefficients: 


fY u -15> 


' 1.84 

-0.24" 


! -15^ 


'-0.88 

0.28 " 

(Y 

-15> 


( e \ 


,-0.06 

1.66 , 


-i-lO, 

+ 

, 0.07 

-0.67, 

Y 

\ 2,t-2 

-1°; 

+ 

K e 2,ty 

o 

1 


Suppose Y u = 15 and Y 2) =10 up to time 11, where a shock takes place. What happens after time 11 
if no further shocks come along? That is, what does the impulse response function look like? 
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The left panels of Output 5.16 show the results of setting the pair (Y,. Y 2 ) to (15,7), (15,13), and 
(17,12). It is seen that a change in either coordinate at time 11 results in the ultimate shifting of both 
coordinates. Also it is seen that there can be a lot of wiggling as the new levels are approached or 
there can be a relatively monotone approach of each coordinate to its new level. An insight into this 
behavior is given by the plots of these three impulse response functions and several others in the 
three-dimensional plots in the right column of the graph. 

The axes represent Y l5 Y 2 , and time t. All series set (Y,. Y 2 ) = (15,10) up to time t = 11, thus 
forming a “means axis.” The top-right panel shows eight possible shocks at time t = 11, fanning out 
in an asterisk-shaped pattern. The middle plot on the right adds in the eight resulting impulse 
response curves, and the bottom-right plot is just a rotated view of the middle plot, with time 
measured by depth into the plot. In the first and second plots, time increases with movement to the 
right, the height of a point is Yj, and its depth back into the plot is Y 2 . The plots include a 0 shock 

case that forms a continuation of the means axis. For a while after the shock at time 11, there can be 
substantial wiggling or relatively smooth movement. What is striking is that as time passes, the 
points all seem to align in a plane. This plane is interpreted as a long-term relationship that will be 
approached over time after a shock bumps the point off of it (the plane). This gives rise to the term 
“error correction,” meaning that movement off the plane is an “error,” and in the long run in the 
absence of shocks, the points will move back to the equilibrium represented by the plane—an error 
“correction.” A single shock can send the system into fairly wild fluctuations that, depending on what 
the series represent, might frighten investors, but these are temporary and the vector ultimately will 
settle near the plane of equilibrium. This equilibrium plane is interpreted as a relationship that cannot 
be dramatically violated for long periods of time by the system. Envision the plane as an “attractor,” 
exerting a force like gravity on the points to settle them down after a shock. 

Further insights are given by a bit of mathematics. Note that a vector VAR(2) model of dimension k, 

\ t = A, V ( , + A 2 V ( 2 + e ( . can be algebraically written in terms of differenced vectors 

VV r = V, - V, ! and a lagged vector V, , as VV ( = -(I - Aj - A 2 )V ( , - A 2 W ( l + e ( , where 

(I - Aj - A 2 ) is I m 2 - AjW - A 2 evaluated at m = 1. So if |l - A t - A, | = 0 (that is, if this matrix is 

less than full rank) then the time series has a unit root m = 1. Any k x k matrix II that has rank 
r <k can be written as II = aP', where a and P are full-rank k xr matrices. Using the A matrices 
currently under discussion, the model 


f Y w —15^ f 1.84 -0.24'IfVi-15^ (-0.88 0.28 YV^” 15 


V Y 2, “IQ 


-0.06 1.66 


V Y W -10 
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V Y 2,f- 2 -10 


fe f 
u 

\ e 2,tJ 
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Y 



r_ 
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-0.07 0.67 
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(e 1 
u 

-10, 

+ 
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The inteipretation here is that S ( = Y l( - Y 2) - 5 is stationary—that is, it tends to be near 0 so that the 
difference Y l( - Y 2) tends to be near 5. This algebraic form of the model is known as the “error 
correction model,” or ECM. The plane satisfying Y l( = Y 2) + 5 at every t is the attractor toward 
which all the impulse response functions are moving in the three-dimensional plots. A vector (a,b) 
such that (a,b)(Y lt ,Y 2t )' is stationary is called a cointegrating vector, so in this case P' = (-1,1) is 
such a vector, as are (-2,2), (1,-1), and any nonzero vector of the foim (—<[>, <p) = <p(—1,1). The set of 
all linear combinations of the rows of P' constitutes the set of all possible cointegrating vectors in 
the general case. 


Next consider N, = Y l( + 4Y 2) = (1.4)(Y l; .Y 2; )' and note that VN ( = (1.4)(VY I( . VY 2( )'. Multiplying 
the vector equation on both sides by the row vector (1,4), it is seen that VN 2 involves lagged levels 
only through the term 


(1,4) 


f-0.04^ 

0.01 


(Y,m -Y, 


-5) = 0 


That is, VN 2 in fact does not involve the lag levels of the variables at all. It is strictly expressible in 
terms of differences, so N, is a unit root process. Also, because the only constant in the model, -5, 
is captured in the term Yj ( , - Y 2 ( , - 5 and is thus annihilated in the VN ( equation, it follows that 
N, has no drift. N ( is a stochastic common trend shared by Y l( and Y 2t . The two interesting linear 
combinations, the nonstationary N, = Y l( + 4Y 2) and the stationary S ( = Y l( - Y 2) - 5, can be written 
as 
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( 0^ 
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from which you see that 


fV 
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r n, > 
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f Nr 1 
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v S r +5 J 


v 0.2 -.2, 

+5 y 


Thus it becomes clear exactly how the nonstationary common trend N, is paid of both Y series. For 
n = aP', with a and P both k xr matrices, the matrix T can always be constructed by stacking a' 
above P', where a' is a (k-r)xk matrix such that a' p a = 0. 
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As a final insight, multiply both sides of the VAR in error correction form by the transformation 
matrix T to get 
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where the z white noise errors are linear combinations of the e errors. The coefficient matrix for the 
lagged differences of N and S is diagonal, which would not be the case in general. Nevertheless there 
does always exist a transformation matrix T such that the vector TV 2 contains 

1 . as many unit root processes as the series has unit roots, followed by 

2. stationary processes (provided none of the original Y series requires second differencing to 
achieve stationarity). 

The period of the sinusoidal waves follows from the mathematical model. With the diagonal 
coefficient matrix in this example, it is easy to describe the stationary component as 
VS t = -,05S ( ! + ,95VS ( ! + z 2t or S ( = 1.9S ( , - ,95S ( 2 + z 21 with characteristic polynomial 

m 2 - 1.9 m + .95 = (m - a/^95 e' e )(m - yf$5e '" ). Here the representation of a complex number as 
re’ 6 = r[cos(9) + /sin(9)] can be used with the fact that sin(-9) = -sin(9) to show that 
\[$5(e 10 + e~' e ) = 2V^95cos(9) must equal 1.9 so that the angle 9 is 9 = arccos^l.99/2V^95 j = 12.92 
degrees. In the graphs one expects 119(12.92/369) = 4 cycles in the 110 observations after the 

shock. This is precisely what the graphs in Output 5.16 show, with V^95 giving the amplitude 
damping factor L periods after the shock. 

The relationship of S ( to Y l( and Y 2) determines the orientation of this sinusoidal fluctuation in the 
three-dimensional plots. For the cases with equal shocks to Y l( and Y 2t , no fluctuations were seen. 
That is because for these cases S ( = Y l( - Y 2) - 5 is no different after the shocks than before, so the 
shocked points are still in the cointegrating plane. With VN 2 = ,60VN ( , describing the component of 
motion in the cointegrating plane, one expects an exponential increase, in the equal shock cases, to a 
new horizontal line contained in the cointegrating plane. That indeed is what happens. The cases with 
unequal shocks to the two Y components force the point off the cointegrating plane, initiating a 
ripple-like fluctuation about the plane as the new levels are approached. In the bottom-right plot of 
Output 5.16, where passing time moves you toward the back of the plot, it is seen that the 
cointegrating plane slopes from the upper left to the lower right while the sinusoidal fluctuations 
seem to move from lower left to upper right and back again repeatedly as time passes. 



270 SAS for Forecasting Time Series 


You have learned some of the terminology and seen some geometric implications of cointegration in 
a hypothetical model with known parameters. You have seen from the graphs, or in more detail from 
the mathematical analysis, that the error correction model defines a simple linear attractor, a line, 
plane, or hyperplane, toward which forecasts gravitate. It can capture some fairly complicated short¬ 
term dynamics. You now look for cointegrating relationships like the S t formula and common trends 
like N, in the Amazon.com data. 

For the Amazon.com stocks it appeared that the relationship log(high) - log(low) was stationary 
with average value about 0.0765. In a three-dimensional plot of H ( = log(high), L ( = log(low), and 
time t, one would expect the points to stay close to a plane having log(high) = log(low) + 0.0765 
over time. That plane and the data were seen in Output 5.10. In the upper-left panel both series are 
plotted against time and it is seen that they almost overlay each other. The upper-right panel plots L f 
versus t in the floor and H ( versus t in the back wall. These are projections into the floor and back 
wall of the points (t, L r ,H r ), which are seen moving from the lower left to upper right while staying 
quite close to a sloping plane. This is the cointegrating plane. The lower-right panel shows this same 
output rotated so that points move out toward the observer as time passes, and to its left the rotation 
continues so you are now looking directly down the edge of the cointegrating plane. This is also the 
graph of H ( versus L f and motivates the estimation of the cointegrating plane by regression, as 
suggested by Engle and Granger (1987). 

Using ordinary least squares regression you estimate an error correction model of the form 


(VL,> 


'-.4185^ 

S ( . + 

'-.0429 0.2787 VVL,-/ 



,0.1799 ; 


,0.4170 -.1344J^VH M J 


where H ( and L ( are log transformed high and low prices, S ( = H ( - L ( - .0765, and .0765 is the 
estimated mean of H ( - L ( . Thus 0.18L ( + 0.42H ( is the common trend unit root process. It can be 
divided by 0.6 to make the weights sum to 1, in which case the graph will, of course, look like those 
of the original series that were so similar to each other in this example. It is a weighted average of 
things that are almost the same as each other. 


5.2.7 Estimating the Cointegrating Vector 

In the Amazon.com data, it was easy to guess that log(high/low) would be stationary and hence that 
log(high) - log(low) is the cointegrating relationship between these two series. This made the 
analysis pretty straightforward. A simple unit root test on log(high) - log(low) sufficed as a 
cointegration test. The high and low prices are so tightly cointegrated that it is clear from the outset 
the data will produce a nice example. In other cases, the data may not be so nice and the nature of the 
cointegrating plane might not be easily anticipated as it was here. The complete cointegration 
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machinery includes tests that several series are cointegrated and methods for estimating the 
cointegrating relationships. If you estimate that H ( - bL t is stationary in the Amazon.com example, 
you might want to test to see if b is an estimate of 1.00 to justify the coefficient of L ( in 
S, = H, - L, - .0765. The techniques include tests of such hypotheses about the cointegrating 
parameters. 

The number of cointegrating relations in a process with known parameters is the rank of the 
coefficient matrix, II, on the lagged levels in the error correction representation. In the previous 
hypothetical known parameter example you saw that this matrix was 


\ 0.04 

-0.04^ 


"-0.04" 

v -0.01 

0.01 j 


yjo-oii 


(1 


-1) = a P 


which is clearly a rank-one matrix. This factoring of the matrix not only shows that there is one 
cointegrating relationship, but also reveals its nature: from the vector (l -l) it is seen that the 

difference in the bivariate vector’s elements is the linear combination that is stable—that is, it stays 
close to a constant. This happens to be the same cointegrating relationship that seemed to apply to the 
Amazon.com case and was displayed in the lower-left corner of Output 5.10. A vector time series of 
dimension 3 could move around anywhere in three-dimensional space as time passes. However, if its 
lag level coefficient matrix is 


r .n 


n 


-.3 

-.2 


(1 -.5 


-•5) 


then the points (Y u ,Y 2t ,Y 3t ) will stay near the plane Y l( - ,5Y 2) - ,5Y, i( = C for some constant C as 
time passes. This is a plane running obliquely through three-dimensional (Y u ,Y 2t ,Y 3t ) space just as 
the line in the lower-left corner of Output 5.10 runs obliquely through two-dimensional space. In 
this case there is one cointegrating vector (1, -.5, -.5) and thus two common trends. You can think of 
these as two dimensions in which the series is free to float without experiencing a “gravitational pull” 
back toward the plane, just as our bivariate series was free to float up and down along the diagonal 
line in the lower-left comer of Output 5.10. Because time added to Y u ,Y 2t ,Y 3t introduces a fourth 
dimension, no graph analogous to the plane in the Amazon.com example is possible. 

As a second example, if 


n = 


" 0.2 

0.1 

v 0 - 8 


0 . 2 ^| 

-.5 

0.1 


n -.5 
1 .2 



then the points (Y, ( , Y 2) , Y, y ) will stay near the line formed by the intersection of two planes: 

Y l( - ,5Y 2) - ,5Y 2i) = Cj and Y l( + ,2Y 2) - ,6Y 2i) = C 2 . In this last example, there are two cointegrating 
vectors and one common trend. That is, there is one dimension, the line of intersection of the planes, 
along which points are free to float. 

SAS/ETS software provides PROC VARMAX to do this kind of modeling as well as allowing 
exogenous variables and moving average terms (hence the X and MA in VARMAX). Note that a lag 
1 and a lag 2 bivariate autoregression have been fit to the Amazon.com data, but no check has yet 
been provided as to whether 2 lags are sufficient. In fact, a regression of the log transformed high and 
low stock prices on their lags indicates that 3 lags may in fact be needed. A popular method by 
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Johansen will be described next. It involves squared canonical correlations. Let W and Y be two 
random vectors. Pick a linear combination of elements of W and one of Y in such a way as to 
maximize the correlation. That correlation is the highest canonical correlation. Using only the linear 
combinations of W that are not correlated with the first, and similarly for Y, pick the linear 
combination from each set that produces the most highly correlated pair. That's the second highest 
canonical correlation, etc. 

Let W and Y be two random mean 0 vectors related by Y = LEW + e, where II is a k x k matrix of 
rank r. Let X vv , L, vw . and X denote the variance matrices of Y, W, and e, and assume W and e are 
uncorrelated. Let X vw = E{YW'} = IE W . The problem of finding vectors J . and scalars X, for 
which 

(Eyw^wSW - A, ; Xyy)Y ; = 0 
or equivalently 

(^YW^ WW^YW^YY “ /I)Y./ = 0 

is an eigenvalue problem. The solutions X j are the squared canonical correlations between Y and W, 
and since the rank of II is r, there must he k -r linearly independent vectors y ; such that 
L', iV y ; = X^ vw II'y / = 0. For these you can solve the eigenvalue equation using A ( = 0 ; that is, there 
are k — r eigenvalues equal to 0. It is seen that finding the number of cointegrating vectors r is 
equivalent to finding the number of nonzero eigenvalues for the matrix X vw X„', iV X'., v X v l v . 

Johansen’s test involves estimating these variance and covariance matrices and testing the resulting 
estimated eigenvalues. 

Begin with a lag 1 model V, = AV M + e ( or VV ( = -(I - A)V ( , + e ( . Johansen’s method (1988, 

1991) consists of a regression of VV t on V ( , ; that is, each element of VV ( is regressed on all the 
elements in V ( , to produce the rows of the estimated -(I - A) coefficient matrix. For a lag 1 model, 
(I - A) = II = aP', where the rows of P' are the cointegrating vectors and the following three 
numbers are all the same: 

r = the rank of I - A 

r = the number of cointegrating vectors, or rows of P' 

r = the number of nonzero squared canonical correlations between 
the elements of VY t and those of V, ,. 
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Johansen suggested studying the estimated squared canonical correlation coefficients to decide how 
many of them are significantly different from 0 and thereby estimate r. Standard procedures such as 
PROC CANCORR will deliver the desired estimates, just as an ordinary regression program will 
deliver the test statistics for a univariate unit root test but not the right p-values. As with the 
univariate unit root tests, the distributions of tests based on the squared canonical correlation 
coefficients are nonstandard for unit root processes, such as those found in the error correction 
model. Johansen tabulated the required distributions, thus enabling a test for r, the number of 
cointegrating vectors. 

In the Amazon.com data, it appeared that (5' could be taken as any multiple of the vector 
H' = (1,-1). Johansen also provides a test of the null hypothesis that P = H(f>, where H is, as in the 
case of the Amazon.com data, a matrix of known constants. The test essentially compares the 
squared canonical correlations between VV ( and V ( ,. to those between VV ( and H'V ( , . If 
W, = aP'V,_ 1+ e, and P = Hep, you can easily see that VV, = o.<j)'(H'V ( ,) + e ( . which motivates the 
test. In the Amazon.com data, if P' is some multiple of H' = (1,-1), one would expect the two 
squared canonical correlations between VV ( and V ( , to consist of one number near 0 and 
another number nearly equal to the squared canonical correlation between VV f and 
S t = spread = H'V ( , = (l.-l)V ( , = log(high) - log(low). The test that the first number is near 0 is a 
test for the cointegrating rank and involves nonstandard distributions. Given that there is one 
cointegrating vector, the test that its form is (1,-1) is the one involving comparison of two 
eigenvalues and, interestingly, is shown by Johansen to have a standard chi-square distribution under 
the null hypothesis in large samples. 


5.2.8 Intercepts and More Lags 

PROC VARMAX gives these tests and a lot of additional information for this type of model. Before 
using PROC VARMAX on the Amazon.com data, some comments about higher-order processes and 
the role of the intercept are needed. Up to now, vector \ t was assumed to have been centered so that 
no intercept was needed in the model. Suppose now that 

V p = A(V, ! - p) + e, and \ t = k + AV W + e t 

where p is a vector of means. The left-hand equation will be referred to as the “deviations form” for 
the model. In order for these two equations to be equivalent, the “intercept restriction” k = (I - A)p 
must hold. Subtracting (V ; , - p) from both sides of the first equation and subtracting V M from both 
sides of the second gives 

W, = (A-I)(V M -p) + e t and W ( = X+(A-I)V M +e, 

In the cointegration case, recall that A -1 = aP', with a p representing a matrix of the same 
dimensions as a such that a' p a = 0 and so a^(A -1) = = 0. Multiplying by a' displays the 

“common trends” in the vector process. The equations become 

a' p V\ = 0 + a' p e t and ^W^a^ + O + a^ 



274 SAS for Forecasting Time Series 


The elements of vector o.', V ( are seen to be driftless random walks in the left-hand equation since 

their first differences are white noise processes. The right-hand equation appears to describe random 
walks with drift terms given by the elements of vector a'X. Of course, once you remember the 

“intercept restriction” X = (I - A)p you see that a' p X = (I - A)p = 0. Nevertheless, some 

practitioners are interested in the possibility of an unrestricted (nonzero) drift in such data. Such data 
will display rather regular upward or downward trends. As in the case of univariate unit root tests, 
you might prefer to associate the unrestricted drift case with a deviations form that allows for such 
trends. Subtracting V ( , p X(t -1) from both sides of 

V, - p - Xt = A(V,_j - p - X(t -1)) + e, 

gives 

VV r -X = -(I - A)(V m - P - X(t -1)) + e, 

Multiplying by a! on both sides and remembering that a' (I - A)p = 0, the common trends for this 
model are given by 

a' p VY t = a' p X + 0 + a' p e t 

Further discussion about the role of the intercept in cointegration can be found in Johansen (1994). 

In the case of higher-order models such as V, = A, V, , + A 2 V ( 2 + e ( or 

VV 2 = (Aj + A 2 - I)V ( , - A 2 VV 2 _j + e ( . the estimate of (Aj + A 2 -1) that would be obtained by 
mutivariate multiple regression can be obtained in three stages as follows: 

1. Regress VV f on VV M getting residual matrix R 1( 

2. Regress V M on VV M getting residuals R (1 

3. Regress R lf on R 2 1 

In higher-order models, then, you can simply replace VV t and V ( , with R l( and R 2 ( , and follow 
the same steps as described earlier for a lag 1 model. In a lag p model, steps 1 and 2 would have 
regressors VV ( ( ,.. ,.VV ( p] , and furthermore, Johansen shows that seasonal dummy variables can 

be added as regressors without altering the limit distributions of his tests. The procedure has been 
described here in a manner that emphasizes its similarity to univariate unit root testing. The reader 
familiar with Johansen’s method may note that he uses a slightly different parameterization that 
places the lag levels at the furthest lag rather than lag 1. For example, \ t = A, V, , + A 2 V ( 2 + e ( 
becomes VV t = (Aj - I)VV ( } + (Aj + A 2 - I)V ( 2 + e ( . The same “impact matrix,” as it is called, 

II = Aj + A 2 -1, appears in either format, and inferences about its rank are the same either way. 
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5.2.9 PROC VARMAX 

Returning to the Amazon.com data, PROC VARMAX is used to produce some of the cointegration 
computations that have just been discussed. 

PROC VARMAX DATA=AMAZON; 

MODEL HIGH LOW/P=3 LAGMAX=5 ECM=(RANK=1 NORMALIZE=HIGH) 

COINTTEST; 

COINTEG RANK=1 H=(l , -1 ); 

OUTPUT OUT=OUT1 LEAD=50; 

ID T INTERVAL=DAY; 

RUN; 

This requests a vector autoregressive model of order 3, VAR(3), on variables high and low. They are 
the log transformed high and low prices for Amazon.com stock. Diagnostics of fit will be given up to 
lagmax = 5. The error correction model, ECM, is assigned a rank 1, meaning that the impact matrix 

Aj + A 2 + A 3 -1 = II = aP' is such that a and P are column vectors (rank 1). Here A,. A 2 , and 
A 3 represent the VAR coefficient matrices. The normalize option asks PROC VARMAX to report 
the multiple of P' that has 1 as the coefficient of high. Recall that if P 'V ( is a stationary linear 
combination of elements of the random vector V ( . then so is any multiple of it. The COINTTEST 
option asks for a test of the cointegrating rank, while the COINTEG statement tests the hypothesis 
that the cointegrating vector P' can be expressed as a multiple of H' = (1,-1). Only a few of the 
many items produced by PROC VARMAX are shown in Output 5.17. 


Output 5.17 

VARMAX on 
Amazon.com 
Data. Part 1 




The 

! VARMAX Procedure 





Number of 

Observations 

509 




Number of 

Pairwise 

Missing 

0 


Variable 

Type 

NoMissN 

Mean 

StdDev 

Min 

Max 

high 

DEP 

509 

3.12665 

1.23624 

1.06326 

5.39929 

low 

DEP 

509 

3.05067 

1.22461 

0.96508 

5.31812 



Cointegration 

Rank Test 



H_0: 

H_1 : 



Critical 

Drift 

Driftln 

Rank=r 

Rank>r 

Eigenvalue 

T race 

Value 

InECM 

Process 

0 

0 

0.1203 

65.50 

O 15.34 

Constant 

Linear 

1 

1 

0.0013 

0.66 

3.84 




Cointegration Rank Test 

under the Restriction 


H_0: 

H_1 : 



Critical 

Drift 

Driftln 

Rank=r 

Rank>r 

Eigenvalue 

T race 

Value 

InECM 

Process 

0 

0 

0.1204 

71.15 

19.99 

Constant 

Constant 

1 

1 

0.0123 

6.26 

9.13 
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Whether or not the intercept restriction (that a! anihilates the intercept) is imposed, the hypothesis 

of r = 0 cointegrating vectors is rejected. For example, in the unrestricted case, Johansen’s “trace 
test” has value 65.50, exceeding the critical value 15.34 O, so r = 0 is rejected. The test for r = 1 
versus r > 1 does not reject the r = 1 null hypothesis. Thus Johansen’s test indicates a single (r = 1) 
cointegrating vector, and hence a single (k -r = 2-1 = 1) common trend. Note that the tests are 
based on eigenvalues, as might be anticipated from the earlier discussion linking squared canonical 
correlations to eigenvalues. From the graph, it would seem that a drift or linear trend term would be 
appropriate here, so the test without the restriction seems appropriate, though both tests agree that 
r = 1 anyway. Assuming a rank r = 1, the null hypothesis that a' anihilates the intercept is tested by 

comparing eigenvalues of certain matrices with and without this intercept restriction. The null 
hypothesis that the intercept restriction holds is rejected using the chi-square test 5.60 with 1 degree 
of freedom. In light of the plot. Output 5.10, it is not surprising to find a drift in the common trend. 


Output 5.17a 

VARMAX on 
Amazon.com 
Data, Part 2 



Test of the Restriction 

when Rank=r 



Eigenvalue 





On 


Chi- 

Prob> 

Rank 

Restrict 

Eigenvalue 

Square DF 

ChiSq 

0 

0.1204 

0.1203 

5.65 2 

0.0593 

1 

0.0123 

0.0013 

5.60 1 

0.0179 


Long-Run 

Parameter BETA 

, Estimates 



Variable 

Dummy 1 

Dummy 2 



high 

1.00000 

1.00000 



low 

-1 .01036 

-0.24344 



Adj ustment 

Coefficient ALPHA Estimates 



Variable 

Dummy 1 

Dummy 2 



high 

-0.06411 

-0.00209 



low 

0.35013 

-0.00174 



The long-run parameter estimates in Output 5.17a allow the user to estimate impact matrices 
FI = aP' of various ranks. For this data the rank 1 and rank 2 versions of II are 


'-0.0641T 
, 0.35013 , 


(1.00000 


-1.01036) 


"-0.064 
, 0.350 


0.0648 ", 
-0.3548, 


and 


(-0.06411 

-0.00209"| 

(1.00000 

-1.01036"! 

'■-0.0662 

0.065283 "! 

l 0.35013 

-0.00174J 

U.00000 

-0.24344J ” 

,0.34839 

-0.353334J 


These are almost the same, as might be expected since there was very little evidence from the test 
that the rank is greater than 1. In this computation, no restriction is made on the intercept. 
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Now suppose W,_j is an augmented version of V ( ,, namely a vector whose last entry is 1 and 
whose first entries are the same as those of V ( ,. For simplicity consider the lag 1 model. Write the 
model as VV t = <*p;w rl +e„ where P' is the same as P' except for its last column. Recall the 
previously mentioned transformation matrix T constructed by stacking a' above P', where a' is a 
(k-r)xk matrix such that a' p a = 0. Because Va^V r = a' p afi' + + a' p e t = a' e t , it follows that the 
elements of a' p V t are driftless unit root processes. These are the first k -r elements of TV ( The 
last /-elements are the stationary linear combinations. They satisfy Vp'V ( = P'oP' W ( , + P'e,. The 
elements in the last column of P'ap; get multiplied by 1, the last entry of W M . In other words, they 
represent the intercepts for the stationary linear combinations. This shows how the addition of an 
extra element, a 1, to V ( , forces a model in which the unit root components do not drift. The result 
is the same in higher-order models. PROC VARMAX gives “dummy variables” for this case as well. 
Because the last column of ALPFIA is 0, the “Dummy 3” columns could be omitted as you might 
expect from the preceding discussion. Having previously rejected the restriction of no drift in the 
common trends, you are not really interested in these results that assume the restriction. In another 
data set, they might be of interest and hence are shown for completeness. 


Output 5.17b 

VARMAX on 
Amazon.com 
Data, Part 3 



Long-Run Coefficient BETA based 



on the Restricted Trend 


Variable 

Dummy 1 

Dummy 2 

Dummy 3 

high 

1.00000 

1 .00000 

1.00000 

low 

-1.01039 

-0.79433 

0.03974 

1 

-0.04276 

-1.48420 

-2.80333 


Adjustment Coefficient ALPHA 



based on the 

Restricted Trend 


Variable 

Dummy 1 

Dummy 2 

Dummy 3 

high 

-0.05877 

-0.00744 1 

.49463E-16 

low 

0.35453 

-0.00614 

9.0254E-16 


5.2.10 Interpreting the Estimates 

A list of estimates ©©follows (Output 5.17c) that shows that the fitted rank 1 model for the log 
transformed high and low prices, H ( and L ( . is 


fVH^ 


" 0.00857 ^ 


"-0.064 

0.065 " 

( H 'j 

1J 7 1 



,-0.01019; 

+ 

, 0.350 

-0.354, 

V L /-i ) 


"0.044 

0.19lYVH^ 

(-0.097 

+ 

0.039 YVH,_Y 

+ 

(e \ 

y 0.294 

O.OlsJ^VL^; 

l, 0.068 

-0.132j[vL,_^ 


\ e 2ty 
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and error variance matrix © 


1 f 2.98 2.29^| 

S =- 

1000^2.29 2.95 


indicating a correlation 0.77 between the errors. 


Output 5.17c 

VARMAX on 
Amazon.com 
Data, Part 4 


Long-Run Parameter 
BETA Estimates 
given RANK = 1 

Variable Dummy 1 

high 1.00000 

low -1.01036 


Adjustment Coefficient 
ALPHA Estimates 
given RANK = 1 


Variable 

high 

low 


Dummy 1 

-0.06411 
0.35013 


Constant Estimates 


Variable 


Constant 


high 0.00857 © 

low -0.01019 


Parameter ALPHA 


BETA 1 Estimates 


Variable 


high 


low 


high 

low 


-0.06411 0.06478 

0.35013 -0.35376 


AR Coefficient Estimates © 


DIF_Lag 

1 

2 


Variable 


high 

0.04391 

0.29375 

-0.09703 

0.06793 


low 

0.19078 

0.01272 

0.03941 

-0.13209 


high 

low 

high 

low 


Covariance Matrix for the Innovation 
Variable high low 

high 0.00298 0.00229 © 

low 0.00229 0.00295 
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5.2.11 Diagnostics and Forecasts 

There follows a series of diagnostics. The regression of VH ( on the lagged levels and two lagged 
differences of both H and L is seen to have a model F test 4.91 O and R square 0.0558 ©, and a 
similar line describing the VL ( regression is found just below this. (See Output 5.17d.)The residuals 
from these models are checked for normality and unequal variance of the autoregressive conditional 
heteroscedastic, or ARCH. type. Both of these departures from assumptions are found. The Durbin- 
Watson DW(1) statistics are near 2 for both residual series, and autoregressive models fit to these 
residuals up to 4 lags show no significance. These tests indicate uncorrelated residuals. 


Output 5.17d 

VARMAX on 
Amazon.com 
Data, Part 5 


Univariate Model Diagnostic Checks 


Variable 

R-square 

StdDev 

F Value 

Prob>F 

high 

0.0558 © 

0.0546 

4.91 

© <.0001 

low 

0.1800 

0.0543 

18.25 

<.0001 


Univariate Model Diagnostic Checks 





Normality 

Prob> 

ARCH1 


Variable 


DW( 1 ) 

ChiSq 

ChiSq 

F Value 

Prob>F 

high 


1 .98 

82.93 < 

.0001 

19.06 

<.0001 

low 


1 .98 

469.45 < 

.0001 

144.47 

<.0001 



Univariate Model Diagnostic Checks 



AR1 


AR1 -2 

AR1 -3 


AR1 -4 

Variable F 

Value 

Prob>F 

F Value Prob>F 

F Value 

Prob>F F 

Value Prob>F 

high 

0.02 

0.8749 

0.07 0.9294 

0.97 

0.4057 

1 .31 0.2666 

low 

0.00 

0.9871 

0.33 0.7165 

0.65 

0.5847 

0.81 0.5182 


Recall that the spread H ( - L ( was found to be stationary using a standard unit root test, and that the 
estimated cointegrating relationship was H ( -1.01 L ( . Given these findings, it is a bit surprising that 
the test that P' = 4>(1, —1) rejects that hypothesis. However, the sample size n = 509 is somewhat 
large, so rather small and practically insignificant departures from the null hypotheses might still be 
statistically significant. In a similar vein, one might look at plots of residual histograms to see if they 
are approximately bell shaped before worrying too much about the rejection of normality. The test 
that P' = 4>(1, -1) is referred to as a test of the restriction matrix H. The test compares eigenvalues 
0.1038 and 0.1203 by comparing 

(n - 3)[log(l - 0.1038) - log(l - 0.1203)] = 506(0.01858) = 9.40 
to a chi square with 1 degree of freedom ©. (See Output 5.17e.) 
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Output 5.17e 

VARMAX on 
Amazon.com 
Data, Part 6 


Restriction Matrix H 
with respect to BETA 

Variable Dummy 1 

high 1.00000 

low -1.00000 

Long-Run Coefficient 
BETA with respect to 
Hypothesis on BETA 

Variable Dummy 1 

high 1.00000 

low -1.00000 

Adjustment Coefficient 
ALPHA with respect to 
Hypothesis on BETA 

Variable Dummy 1 

high -0.07746 



low 

0.28786 




Test for Restricted Long-Run Coefficient 

BETA 



Eigenvalue 

Chi- 


Prob> 

Index 

OnRestrict 

Eigenvalue Square 

DF 

ChiSq 

1 

0.1038 

0.1203 9.40 O 

1 

0.0022 


The fitted model implies one common trend that is a unit root with drift process and one 
cointegrating vector. The last bit of code requests forecasts using the VAR(3) in rank 1 error 
correction form. These are put into an output data set, a few observations from which are shown in 
Output 5.17f. An additional complication with these data is that the market is closed on the 
weekends, so the use of the actual dates as ID variables causes a missing data message to be 
produced. An easy fix here is to use t = observation number as an ID variable, thus making the 
implicit assumption that the correlation between a Monday and the previous Friday is the same as 
between adjacent days. A portion of these data, including standard errors and upper and lower 95% 
confidence limits, is shown. 


Output 5.17f 

VARMAX on 
Amazon.com 
Data, Last 
Part 


Obs 

t 

high FORI 

RES1 

STD1 

LC11 

UCI1 

508 

508 

4.86272 4.88268 

-0.019970 

i 0.05461 

4.77565 

4.98972 

509 

509 

4.79682 4.85402 

-0.057193 

i 0.05461 

4.74698 

4.96105 

510 

510 

4.79125 


0.05461 

4.68422 

4.89829 

511 

511 

4.80030 


0.08475 

4.63420 

4.96640 

512 

512 

4.80704 


0.10715 

4.59704 

5.01704 

Obs 

low 

F0R2 

RES2 

STD2 

LCI2 

UCI2 

508 

4.75359 4.81222 -0. 

058634 

0.05430 

4.70580 

4.91865 

509 

4.71290 4.75931 -0. 

046406 

0.05430 

4.65288 

4.86574 

510 


4.70442 


0.05430 

4.59799 

4.81084 

511 


4.70671 


0.08605 

4.53806 

4.87537 

512 


4.71564 


0.10855 

4.50288 

4.92840 
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You can observe the quick spreading of confidence intervals, typical of data whose logarithms 
contain a unit root. The fact that the unit root is in some sense shared between the two series does not 
do much to narrow the intervals. The drift in the underlying unit root process, or common trend, is 
apparent in the forecasts. The short-term dynamics do not seem to contribute much to the forecasts, 
suggesting that the last few observations were quite near the cointegrating plane. (See Output 5.18.) 


Output 5.18 

Forecasts 

Using 

Cointegration 
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6.1 Introduction 

In ARIMA modeling, one of the difficult tasks is to select a model. Also, if you have several related 
time series, they must satisfy some restrictive conditions in order to justify the kind of transfer 
function modeling that is available in PROC ARIMA. There must be no feedback, and, for proper 
identification and forecast intervals, multiple inputs must be independent of each other and enough 
differencing must be specified to render the series stationary. PROC STATESPACE allows 
estimation under less restrictive conditions and provides some automatic model specification ability, 
although the user is still responsible for making the series stationary. 

In Chapter 5, another procedure, PROC VARMAX, was discussed. This procedure also handles 
multiple series and, unlike STATESPACE, can perform cointegration analysis, which is appropriate 
when your series display unit root nonstationarity but some linear combination of the series is 
stationary. In other words, the transformation to stationarity is not just differencing. 

The basic idea in state space modeling is to discover the “state vector.” The state vector consists of 
the current values of all series under investigation plus enough forecasts into the future so that all 
forecasts, no matter how far away, are linear combinations of these. 


6.1.1 Some Simple Univariate Examples 

To get started, here are some models, all with mean 0, and their forecasting equations. As is 
customary in discussing state space models, the symbol Y ) L) denotes a forecast of Y ( L using 

information available at time t. In model discussions in this section, the default assumption is that the 
mean has already been subtracted. 
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Table 6.1 One-, Two-, and Three-Step-Ahead Prediction for Different Models 


Name 

Formula 

t+p 

t+p 

t+p 

AR(1) 

Y t = aY t _ l +e t 

aY, 

a 2 Y, 

a 'Y, 

AR(2) 

Y t — a iY r _j + a 2 Y, 2 + e t 

Y, + p = oOG+a 2 Y,_ 1 

a iYf +1 / + a 2 Y, 

«it+p + ouY f+lk 

MA(2) 

Y t — e t + Pt, i + 

Pi e t P: 6 ? i 

P 2 e t 

0 

ARMA(l.l) 

Y t = aY t _ x +e t +^e t _ x 

aY, + pq 

a(aY t + fie t ) 

a 2 (aY, +fie t ) 


Numerical examples and further discussion of models like these appear in Section 6.2. 

A “linear combination” of a set of variables is a sum of constant coefficients times variables. For 
example, 2X + 3Y and 5X - 2Y are linear combinations of X and Y. Notice that 
6(2X + 3Y) - 4(5X - 2Y) is automatically also a linear combination of X and Y; that is, linear 
combinations of linear combinations are themselves also linear combinations. Note that OX + OY = 0 
is also a valid linear combination of X and Y. Considering Y t and Y ) L) for L = 1,2,... to be the 

variables and considering functions of model parameters, like a L , to be constants, the state vector is 
defined to be (Y,,Y 2+1 | 2 ,Y 2+2 | 2 ,.. -,Y t+ ^ t ), where k is the smallest value such that all remaining forecasts 

% + l|, with L >k are linear combinations of the state vector elements. For the AR(1) model all 
forecasts are linear combinations (multiples) of Y ( . so the state vector is just (Y ( ). For the AR(2) the 
state vector is (Y,,Y 2+1 | 2 ). It can’t be just Y ( because Y (l , involves Y ( , whose value cannot be 
determined from Y ( . Flowever, Y, 2) is a linear combination, a, Y ( | ( + a 2 Y,. of state vector entries, 

and Y 2+3 | 2 is a linear combination, a, (a, Y ( , ( + a 2 Y ( ) + a 2 Y ( |) . of them too. The expressions get more 
complicated, but by the “linear combination of linear combinations” argument it is clear that all 
forecasts are linear combinations of Y t and Y (| ,. You can see that for an AR(p) the state vector will 

have p elements. For moving averages it is assumed that current and past e t s can be well 
approximated from the observed data—that is, MA models need to be invertible. Acting as though 
e t s that have already occurred are known, it is clear from the MA(2) example that for an MAh/) 
model, forecasts more than q steps ahead are trivial linear combinations (0) of state vector elements. 
Finally, for mixed models the forecasts are eventually determined through autoregressive type 
recursions and, by the linear combination of linear combinations argument, must be linear 
combinations of state vector elements from that point on. 

The state vector contains all the information needed to forecast into the infinite future. During an 
early space shuttle mission in which the landing was broadcast, the mission control engineers were 
heard to say, “Your state vector is looking good.” What did that mean? Numerical measurements of 
height, velocity, deceleration, and so forth were being taken, and from them, forecasts of the flight 
path into the future were being computed. Of course these state vector entries were being updated 
quickly and state space forecasting is based on this updating idea. At time t + 1, the state vector will 

be updated to (Y r+1 ,Y 2+2 | 2+1 ,Y 2+3 | 2+1 ,...,Y 2+A+1 | 2+1 ). The updating equation is the model in PROC 

STATESPACE and it is the thing that you are trying to estimate from the data. In the space shuttle 
example, if the elements of the state vector included height, deceleration, and location information, 
then a state vector that “looked good” would be one whose projections forecast a landing on the 
runway. 
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6.1.2 A Simple Multivariate Example 


Now suppose a vector process is of interest. An easy case to consider is 



fl.3 


f —0.4 0YY u _^ 

+ 

+ 

(e \ 

Iy*, 

= Ui 

0-7^01; 

0 oJ^y 2> ,_ 2 j 


\ e 2t/ 


This is a vector autoregressive model, VAR, of dimension 2 (2 elements) and of order 2 (maximum 
lag is 2). The state vector is 


Z,= 


Xt 

y, 

Y. 


To see why this is the case, first note from the bottom row in the model equation that the one-step- 
ahead predictor Y 2 ( | ( = 0.1Y I( + 0.7Y 2) is clearly a linear combination of state vector elements and 
thus does not need to be included in the state vector. Next, note that if the best predictor is used and 
the coefficients are known as is assumed here, then the forecast Yj ( l ( will differ from Yj ( , only by 

e x ( l . which is the error term that, at time t, has yet to be realized. The same is true for Y 2) . and 
using Y 2 ( | ( = 0.1Y I( + 0.7Y 2) you thus have 


Y 

■*■ 1 , 0-1 

Y 

■*■ 2 , 1+1 


= Y +e 

A l,i+l|i Te l,i+1 

= 0.1Y W + 0.7Y 22 + e 2J+x 


Noting from the top row of the model equation that Yj ( 2 = 1,3Yj ( , - 0.9Y 2 ( x - 0.4Y I( + e x t+2 , it is 
seen that forecasting one step ahead using information available up through time t +1 would produce 

Y u+2|m =1.3Y u+1 -0.9Y 2+1 -0.4Y w 

= 1 -3(Yj 2+1|2 + e xt+x ) - 0.9(0.1Y U + 0.7Y 22 + e 2t+x ) - 0.4Y U 
= -0.49Y W -0.63Y 22 +1.3^ +1.3c u+1 -0.9 c 2 2+1 


These three equations show how to update from Z t to Z ( l . You have 


Y 

■*■1,1+1 


' 0 

0 

1 ^ 


Y 

■*■ 2,1+1 

= 

0.1 

0.7 

0 

Y+,i 

Y 

1,1+21+1 J 


,-0.49 

-0.63 

l.sj 

Y 

y i,i+ii 


( 1 
0 


0 ^ 
1 


v 1.3 -0.9y 






^ 2 , 1+1 J 


which has the form 


Z 


t+\ 


FZ 2 + GE 


t+l 
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This looks quite a bit like a vector autoregressive model, and you might think of the state space 
approach as an attempt to put all vector ARMA processes in a canonical form that looks like an 
AR(1), because it happens that every possible vector ARMA process of any dimension can be cast 
into the state space form Z M = FZ ( + GE ( I . While this eliminates the problem of identifying the 
autoregressive and moving average orders, it introduces a new problem—namely, deciding from the 
observed data what elements are needed to construct the state vector. 

Prior to discussing this new problem, a simulation of 2000 values from this bivariate VAR model is 
used to produce some state space output. The data set TEST contains the variables Y and X, 
corresponding to Y, and Y 1( , respectively. The code 

PROC STATESPACE DATA=TEST; 

VAR Y X; 

RUN; 

is all that is needed to produce the results in Output 6.1. 


Output 6.1 
PROC 

STATESPACE 
on Generated 
Data 


The STATESPACE Procedure 


Selected Statespace 

Form and Fitted 

Model 

State 

Vector 


Y(T;T) X(T;T 

) Y (T+1 ; 

T) 

Estimate of Transition Matrix 


0 

0 

1 

0.112888 0.683036 -0.0182 

-0.52575 -0. 

5764 1.34468 

Input Matrix 

for Innovation 


1 

0 


0 

1 


1.322741 

-0.85804 


Variance Matrix 

for Innovation 


1.038418 

0.018465 


0.018465 

0.9717 


Parameter Estimates 



Standard 


Parameter Estimate 

Error t 

Value 

F(2,1) 0.112888 

0.027968 

4.04 

F(2,2) 0.683036 

0.040743 

16.76 

F(2,3) -0.01820 

0.028964 

-0.63 

F(3,1) -0.52575 

0.037233 

-14.12 

F(3,2) -0.57640 

0.054350 

-10.61 

F(3,3) 1.344680 

0.038579 

34.86 

G(3,1) 1.322741 

0.021809 

60.65 

G(3,2) -0.85804 

0.022550 

-38.05 
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The state vector has been correctly identified as containing Y l( . Y 2) . and Y : ( l) . as is seen in the 

notation Y (T;T) X(T;T) Y ( T+l ; T ) using the X Y variable names. Had this not been the case, the 
user could specify 


FORM Y 2 X 1 

to force a one-step-ahead Y predictor and no predictions of future X to enter the state vector. Of 
course this assumes the unlikely scenario that the user has some prior knowledge of the state vector’s 
true form. The matrix F is referred to as the transition matrix and G as the input matrix in the output. 
Comparing the true and estimated F matrices you see that 



' 0 

0 

1 > 


' 0 

0 

1 ^ 

F = 

0.1 

0.7 

0 

and F = 

0.11 

0.68 

-0.02 


v -0.49 

-0.63 

I- 3 , 


k ~0.53 

-0.58 

1.34 , 


and for the input matrix G 



( 1 

0 > 


( 1 

0 ' 

G = 

0 

1 

and G = 

0 

1 


l 1 - 3 

-0.9 ; 


v 1.32 

-0.86 y 


Entries of 0 or 1 are known once the state vector has been determined. They are structural parameters 
that do not require estimation. No elements of F or G are more than 2 standard errors away from the 
true values, and all estimates are quite close to the true values both numerically and statistically. 
Knowing that the estimate -0.02 is in fact an estimate of 0, you would expect its t statistic to be 
smaller than 2 in magnitude, which it is (t = -0.63). You might want to drop that term from your 
model by forcing its coefficient to 0, using the statement 

RESTRICT F(2,3)= 0; 

in the PROC STATESPACE step to restrict that row 2, column 3 element to 0. Doing so produces 
the results in Output 6.2. 
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Output 6.2 
RESTRICT 
Statement in 
PROC 

STATESPACE 


The STATESPACE Procedure 
Selected Statespace Form and Fitted Model 

State Vector 


Y(T;T) X(T;T) Y(T+1;T) 


Estimate of Transition Matrix 


0 

0.095449 

-0.51028 


0 

0.707423 

-0.59826 


1 

0 

1.328525 


Input Matrix for Innovation 

1 0 

0 1 

1.322604 -0.85752 


Variance Matrix for Innovation 

1.038417 0.018442 

0.018442 0.971882 

Parameter Estimates 


Parameter 

Estimate 

F(2,1) 

0.095449 

F(2,2) 

0.707423 

F(3,1 ) 

-0.51028 

F(3,2) 

-0.59826 

F(3,3) 

1.328525 

3(3,1) 

1.322604 

3(3,2) 

-0.85752 


Standard 


Error 

t Value 

0.003469 

27.51 

0.012395 

57.07 

0.028997 

-17.60 

0.043322 

-13.81 

0.029876 

44.47 

0.021816 

60.62 

0.022552 

-38.03 


The estimated elements of F and G are again close to their true values. Plots of both series and their 
forecasts are seen in Output 6.3. 
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Output 6.3 

Forecasts for 
Generated Data 



The forecasts seem to have a little more interesting structure than some you have previously seen. 
This has to do with the nature of the roots of the characteristic equation. 

As in a univariate series, the behavior of a VAR of the form 

% = W-i + A 2 Y i> _ 2 H b A. p Y t p + E, 

is determined by the roots of a “characteristic equation,” and the same is true for a vector ARMA. 
Here E, is a vector of random normal variables that can be contemporaneously correlated in any 
arbitrary way, but must be uncorrelated across time. Y ( is a dimension k vector of deviations from 
means. For k = 3 these might be the time t deviations of GDP, unemployment, and interest rates 
from their long-term means, and each A ; is a k xk matrix of parameters to be estimated. The 
characteristic equation involves the determinant 

\m p \ - m p ^ Aj - m p ~ 2 A 2 - A p | 
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and the values of m that make this 0 are the roots. In the VAR example currently under study, 


ji cn (i.3 

-0.9^| 

'-0.4 

0" 


' rn 

-1.3/W + 0.4 0.9 m ) 

m -m 

lo U lo.i 

0-7 J" 

0 

\ 



V 

-0.1m m 2 - 0.7m j 


= {rn -\3m + A^[rn -0.7m) + 0.09m° 

= m(m i -2m 2 -fl.4m-0.28) 

= m(m - 0.32966342378782) (m 2 -1.6703366 m + 0.84935) 


whose roots are 0, 0.32966 and the complex pair 0.83517 ± 0.3987/. The complex pair of roots has 
representation ^0.84935 (cosO ±/sin0) 5 so 0 = Atan(.3987/.83517) = 25.5 degrees, implying a 

damped sinusoidal component with damping rate VO.84935 and period 360/25.5 = 14.1 time 
periods as you forecast L periods ahead. This seems consistent with the graph. At a lead of around 
L = 14, the forecasts hit local low points as they did at the end of the data. Each low is about 

i -14 

v0.84935 = 0.32, or about 1/3 of what it was then. All of these roots are less than 1 in magnitude, 
this being the stationarity condition. Some sources write the characteristic equation in ascending 
powers of m, namely, 

I - mA l - m 2 A 2 - m p A p | = 0 

whose roots they then require to all be greater than 1 in magnitude for stationarity. 

In a VAR of order 2 you have 

Y t = A l Y t _ l +A 2 Y t _ 2 +E t 

which is sometimes written in a matrix form like this: 


f Y - 

> 


' 0 

I ) 

fY,J 

+ 

"o 2 


) 


v^-2 

aJ 





This simply says that Y ( , = Y ( , and Y ( = A 2 Y t _ 2 + A,Y ( , + E ( . so it consists of a trivial identity and 
the original AR(2). If you substitute the A matrices of the current example, you have 


(Y > 


O 

O 



0 0 



-0.4 0 



O 

O 


1 

0 2 

(Y ^ 


m 

0 

1 


+ 

0 

1.3 

-0.9 

Y U-1 


e lt 

0.1 

0.7, 

y 

V j 


\ e 2ty 


which looks somewhat similar to the state space representation; however, this system has dimension 
4, not 3 as you had previously. Had the matrix A 2 been of full rank, the system would have had full 
rank, the state vector would have had dimension 4, and the above representation would be another 
type of state space representation of the vector process. It would differ from the process used in 
PROC STATESPACE in that current and lagged Ys, rather than current Ys and predictions, 
constitute the state vector elements. As it stands, the system is not full rank and can be reduced by 
simply eliminating the second row and second column of the coefficient matrix. That second row 
produces the trivial identity Y 2 ( , = Y 2 ( ,, which, of course, is true whether you put it in the system 
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or not. The second column is all Os, so leaving out the second row and column makes no real change 
in the system. The resulting reduction gives 




O 

O 

(Y ^ 


f °1 

Y w 

= 

-0.4 1.3 -0.9 

Y U-! 

+ 

e u 



O 

o 

o 

y 

vy-v 


\ e 2t ) 


again having a familiar form Z, = FZ ( , + GE ( The first row gives a trivial identity, but it is needed 
to make the system square. This 3x3 system is observationally equivalent to the 4x4 system in 
that, for any e sequence and any given initial values, the two systems will produce exactly the same 
sequence of Ys. In the theoretical research on STATESPACE methods there are several ways to 
formulate the state vector, as has been demonstrated. The size of the state vector Z and the general 
form of the updating recursion is the same for all ways of writing a state vector. The entries of Z and 
of the matrices F and G depend on the particular formulation. 

Every state vector Z, that arises from a vector ARMA satisfies a recursive relationship of the form 
Z ( l = FZ ( + GE (I . In PROC STATESPACE the state vector Z ( always consists of the current 
observations—say, Y l( . Y 2) . and Y, i( if you have 3 observed series at each time t —along with 
predictions into the future. For example Z, might contain Y l( . Y 2t , Y, i( . Y ( ( | ( . Y, (| ( . and Y ( )2( . 
There will be no “gaps”; that is, if the two-step-ahead predictor Y : ( 2) is included, so must be the Yj 
predictors up to two steps ahead. 

How do you decide what to put in the state vector? Returning to the bivariate VAR of order 2 that is 
being used as an example, it is possible from the model to compute the autocorrelation between any 
Y jt and Y /v for the same (/' = j) or different vector elements at the same (i = s ) or different times. 

The data were generated using the innovations variance matrix L defined as 


H = Var 


(e > 

K \t 


n 

0" 

\ e 2tJ 


v0 

K 


The covariance matrix between column vector Y t and row vector Y/ +/ , symbolized as T( /). is 
defined as an expected value, namely, you define F( / ) = E {Y ( Y/ +/ } (assuming Y t has mean 0). 
Multiplying the AR(2) on both sides by Y' t j and taking expected values, you see that 

F(-y) = e|y 2 Y/ = A,E j Y ( , Y/ ,j + A 2 E j Y ( 2 Y/ ,} = A,f(- / + 1) + A 2 f(- / + 2) 


for j > 0. For j = 0 you find 

r(0) = A 1 r(i) + A 2 r(2) + i: 
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Now r(-7') = r'(7), so for j> 0 you have 

r(y) = T(j- i)A; + r(y - 2) a; 

These, then, constitute the multivariate Yule-Walker equations that can be solved to give all the 
covariances from the known A coefficient matrices and the innovations variance matrix E. Thus it 
would be possible to compute, say, the 6x6 covariance matrix M between the vector 


(Y u 

y 2 , 

Yu-: 

Y 

A 2,t-\ 

Y 

A \,t-2 

Y, ( ,), these identifying the columns of M, and the vector 

(Yu 

y 2 , 

Y u+1 

Y 

A 2,t+\ 

Y 

A \,t+2 

Y, ( these identifying the rows. That matrix would have what is 

known as a block Hankel form: 




rr( 0 ) 

r(i) 

r(2)> 



M = 

r(i) 

r( 2 ) 

r(3) 




lr(2) 

r(3) 

r(4), 



State space researchers describe M as the covariance matrix between a set of current and lagged Ys 
and a set of current and future Ys. For such a matrix M, the following numbers are all the same: 


1. The size of the state vector 

2. The rank of the covariance matrix M 

3. The number of nonzero canonical correlations between the set of current and lagged Ys and 
the set of current and future Ys. 


Items 2 and 3 are always the same for any covariance matrix. (See the PROC CANCOR 
documentation for more information on canonical correlations.) Thus the size of the state vector and 
the nature of the corresponding state space equations can be deduced by studying the covariance 
matrix M. 


With only data, rather than a model with known coefficients, the covariances must be estimated. The 
strategy used is to fit a long vector autoregression whose length is determined by some information 
criterion, and then to use the fitted model as though it were the true structure to construct an estimate 
of M. The initial autoregressive approximation provides an upper bound for the size of M. Returning 
to the order 2 VAR with known coefficient matrices, by substitution, you can see that these r(y) 
matrices satisfy the multivariate Yule-Walker equations. 



"39.29 

1.58^| 

r( 0 ) = 

v 1.58 

3.17J 


"13.06 

7.54^ 

r(3) = 



v-8.46 

0.32; 


F(l): 

r(4) 


"35.47 

5.04^| 

f 25.86 

7.07' 



r( 2 ) = 


v-2.81 

231) 

1-6.42 

1.38, 

(-0.14 

6.58 



1-8.73 

-O. 62 J 




These in turn lead to a matrix M formed by stacking together the T matrices in the block Hankel 
form previously suggested, namely, 
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"39.29 

1.58 

35.47 

5.04 

25.86 

7.07^ 


rr( 0 ) 

F(l) 

r(2)> 


1.58 

3.17 

-2.81 

2.37 

-6.42 

1.38 



35.47 

5.04 

25.86 

7.07 

13.06 

7.54 

M = 

F(l) 

lr(2) 

F(2) 

F(3) 

F(3) 

r(4), 


-2.81 

2.37 

-6.42 

1.38 

-8.46 

0.32 



25.86 

7.07 

13.06 

7.54 

-0.14 

6.58 






,-6.42 

1.38 

-8.46 

0.32 

-8.73 

-0.62 y 


You can diagnose the column dependencies and rank of matrix M using a clever trick. If any column 
of a matrix is a linear combination of some others, then a regression of that first column on those 
others (no intercept) will fit perfectly. Regressing column 2 of M on column 1 and column 3 on 
columns 1 and 2, you find nonzero error sums of squares, indicating that columns 1, 2, and 3 form a 
linearly independent set. Regressing any other column on columns 1, 2, and 3 gives a perfect fit and 
so shows that the rank of matrix M is 3. For example, you find by regression that 
(column 4) = 0.1 (column l) + .7 (column 2) + 0 (column 3), so that the covariance between the column 4 
variable, Y 2 (l . and any future Y is the same as that between 0.1Y I( + 0.7Y 2) and that same future Y. 
Linear forecasts such as we are considering are functions only of the covariances. Therefore, adding 
Y 2 ( | ( to a set of predictors that already contains Y l( and Y 2) does not add any more prediction 

accuracy. 

Note that the second row of the state space transition matrix F is 0.1, 0.7, 0, so the same regression 
that displayed the dependency gives the corresponding row of F. On the other hand, column 3 is not 
a perfect linear combination of columns 1 and 2. You get a positive error mean square when 
regressing column 3 on columns 1 and 2. Regression reveals that 

(column 5)= -0.49(column l)-0.63(column 2)+1.3(column 3) with 0 error sum of squares. Again 
note that the coefficients give a row of F. Column 5 is associated with Y : ( 2 . so even though you 

needed Y : ( | ( in the state vector, there is nothing to be gained by including Y : ) 2) . The dependent 
columns, 4 and 5, thus far considered are the first columns associated with series Y 2 and with series 
Yj that show dependencies. These dependencies reveal the number of forecasts of each series that 
appear in the state vector (one less than the lag number associated with the dependent column) and 
the row of the F matrix associated with the last occurrence of that series in the state vector. Once the 
first dependency in each variable has been discovered, the state vector has been completely 
determined and no further investigation is needed. Column 6 is automatically a linear combination of 
columns 1,2, and 3 at this point. 

A perfectly fitting regression corresponds to a canonical correlation 0 in matrix M. In particular, you 
can build a sequence of matrices by sequentially appending columns of M. When you use the first 
four columns of M you will get a 0 canonical correlation, but not before. That tells you the fourth 
column, and hence Y, (| ( . is redundant information. Leave out that redundant fourth column and 

consider a matrix consisting of column 1,2,3, and 5 of M. If that matrix had no 0 canonical 
correlations, then Y : ( 2) (associated with column 5) would have been included in the state vector, but 

in this example, the addition of column 5 also produces a 0 canonical correlation. Since dependencies 
for both series have been discovered, you need not look any further. 

When estimated covariances are used to get an estimated M matrix, that matrix, M, will almost 
certainly be of full rank, possibly with some small but nonzero canonical correlations. What is 
needed is a statistic to decide if a small estimated canonical correlation in M is consistent with the 
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hypothesis that M has corresponding true canonical correlation 0. A criterion DIC to do so has been 
proposed by Akaike. If you build matrices as described above by appending columns of M, then the 
DIC criterion is expected to be negative when the column just added introduces an approximate 
dependency. That column would then be omitted from the matrix being built, as would all columns to 

its right in M that correspond to lagged values of that series. Then the appending would continue, 
using only columns of the other series, until dependencies have been discovered in each of the series. 
Like any statistical criterion, the DIC is not infallible and other tests, such as Bartlett's test for 
canonical correlations, could also be used to test the hypothesis that the newly added column has 
introduced a dependency in the system. 

Now, if there are moving average components in the series, things become a little more complicated 
and, of course, estimates of the elements of G are also needed. But if you have followed the example, 
you have the idea of how the STATESPACE procedure starts. The long autoregression is run, the 
estimated M matrix is computed from it, the rank is diagnosed, and initial elements of F and G are 
computed by treating estimated covariances as though they were the true ones. Thus the initial 
estimates of F and G fall into the “method of moments” category of estimates. Such estimates are 
approximate and are often used, as is the case here, as stalling values for more accurate methods such 
as maximum likelihood. Another nice feature of the maximum-likelihood method is that large sample 
approximate standard errors, based on the derivatives of the likelihood function, can be computed. 
Examples of these standard errors and t tests were seen in Output 6.1 and Output 6.2. 

Additional numerical examples and discussion are given in Section 6.2. Some ideas are reiterated 
there and some details filled in. The reader who feels that Section 6.1 has provided enough 
background may wish to move directly to Section 6.3. The following section is for those interested in 
a more general theoretical discussion. 


6.1.3 Equivalence of State Space and Vector ARMA 
Models 

A general discussion of the state space model, under the name “Markovian representation,” is given 
by Akaike (1974). The following summarizes a main idea from that paper. 

Let Y ( represent a dimension k vector ARM A ( p.q) process with mean vector 0, and let E ( be an 
uncorrelated sequence of multivariate, mean 0 normal variables with variance matrix E. The 
ARMA(/>,g) process is 


Y, = Aj+ • • • + A, ¥,_, + E, - BjE,_j — - 


At time t - 1, 


Y m = A, Y r _ 2 + • • • + A, ¥,.,_, + - B E 2 

so substituting in the original expression gives 

Y, = A, (A, Y,_ 2 + • • • + A, ¥,_,_, + E t _, - BjE 
+ A 2 Y t _ 2 + • • • + A p Y t _ p + E, - BjE m - 


BE 


t-\-q 


t t — * * * ~ B E, , ) 

t — 2 cj t — 1 —q I 

-B E, 

^q^t-q 


which involves current and lagged E vectors and Y vectors prior to time t - 1. Repeated back 
substitution produces a convergent expression only in terms of the E vectors, say, v X . E - - 
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provided the series is stationary. The forecast of any Y ( L = ^ _\|/ / E r+L _ / using information up to 

time t would just be the part of this sum that is known at time t, namely, Y ) L) =1" l V/E, + w , and 
the forecast of the same thing at time t + 1 just adds one more term so that Y ) L) | = Y ) L) + \\t L ,E ( ,. 
So if the state vector Z ( contains Y )L) . then it will also contain the predecessor Y ( L |( . Thus Z M 
will contain Y r+L | W , and the relationship between these, Y r+L | r+1 = Y r+L | r + \|/ L _jE r+1 | will provide k 

(the dimension of Y) rows of the state space equation Z t+1 = FZ ( , + GE ( I . In particular, using a 
question mark (?) for items not yet discussed, you have 


r y. 


f+M-l|f+l 

Y 

f+Mf+1 


fo I 0 

0 0 I 

0 0 0 

? ? ? 


0^ 

0 

I 

? 


( Y ^ 


f+M-2|f 

y 


r i ^ 

¥i 

¥m- 


E, 


What should be the size of the state vector—that is, what should you use as the subscript M? To 
answer that, you look for dependencies. At time t + L the model becomes 


V - A V 

L ^1 A f+L-1 


- A V 

1 t+L-p 


' e ?+l - BiE r+L _! 




If there were only information up to time t, the forecast of Y ( L would be 


= ^ Y, + 


- A Y 

p t+~L-p\t 


' E >+lI/- “ B l E >- 


' B « E M-L-g|f 


where E ( (( is 0 for j > 0 and is the vector of one-step-ahead forecast errors at time t + j if j < 0. 
For leads L exceeding the moving average length q, this becomes 


X'+Llf ^lX‘+L-l|f 


A f X + L- f | ( (forL > q) 


and so if the state vector Z t contains Y ( , Y |(j .... Y M) . where M is max( p.q + 1). then every 
forecast Y ) L) with L > M will be a linear combination of these. This establishes M. 

Finally, to replace the “?” in F and G, note that Y r+M | r = AjY r+M ^ H-1- A„Y ^g..^ |.. which combined 

with Y M( ! = Y , |( + \\t M !E ( ! gives the full set of equations. Expanding the set of autoregressive 
coefficient matrices with A ; = 0 when j > p. you have the complete set of equations 


Y, +1 

A 

' 0 

I 

0 

• 0 N 

r v f+1 > 


{ 1 1 

Y 

f+2f+l 


0 

0 

I 

0 

Y, +lk 

+ 

Vi 

Y 

f+M-lf+l 


0 

0 

0 

I 

Y 

f+M-2f 


Vm-2 

Y 

f+Mf+1 

/ 

A 

V M 

A 

A 

rA 'M-2 

• aJ 

Y 

y f+M-lf j 


V ¥m-i ) 
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This would be the final state space form if the system were of full rank. Such a full-rank system is 
called "block identifiable.” In this case the link between the ARIMA representation and the state 
space representation is relatively easy to see, unlike the Section 6.2.2 example 



7.3 

7 

/ Q) 

O 

1 

o’ 

1 

+ 

0YY,+ 

+ 

(e \ 

c \t 

{ X„ 

"lo.i 

0.7 

l 0 





where the 4x4 system arising from the current discussion was not block identifiable. It had a linear 
dependency, and a reduction in the size of the system was called for, ultimately producing a state 
space representation with dimension 3. 

Consider the state space representation 


Y ' 

L t+ 1 


O 

O 

AC 


n 

x, +1 

= 

1 

-1- 

•J\ 

o© 

bo 

X, 

+ 

0 

Y 

t+2\t+\ J 


^.044 -.058 .72J 

Y 

V M \* J 


u 


0 

1 

-.6 


r e ^ 

e u+1 
V e 2,f+i j 


Suppose you want to find the equivalent bivariate ARMA representation for (X ( Y ( )'. From row 2 
you have X M , t = -AAY t + ,58X r + .8Y, +1 u so that 


X, + * +1 =(-.44 .58 .8) 


( Y ^ 

L t +1 

X , +1 

Y 



' 0 

0 

1 ^ 

(x ' 



n 

0 N 

fe, A 

(-.44 .58 .8) 

-.44 

.58 

.8 

x, 

+ (-.44 .58 

•8) 

0 

1 

\,t+\ 


v 044 

-.058 

.72 J 

y 

V r+ F J 



u 

“• 6 y 

\ e 2, M ) 


= (-.22 0.29 0.60) 


Y, ' 


( e A 

x, 

+ (.2 .1) 

c U+l 

Y „ 



V r+ F J 




Inserting this extra line into the state space equations you get 


f Y > 

■Co 


' 0 

0 

1 

0" 

f Y '' 


n 

0 N 


x, +1 


-.44 

.58 

.8 

0 

X, 


0 

1 

(e \ 

Y 

f+2f+l 

— 

.044 

-.058 

.72 

0 

Y, +lk 

+ 

.8 

-.6 

\ 2,^+1 ) 

v 

^^++201 J 


,-.22 

.29 

.6 

0, 



v2 

■b 
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Now anytime you see -,44Y ( + ,58X ( + ,8Y l( you can replace it with X (| ( . so row 2 of F can be 
replaced by 0 0 01. The result is 


f Y > 

1 f+i 


" 0 

0 

1 

0" 

f Y "' 


n 

0 ' 


x, +1 


0 

0 

0 

1 

X, 


0 

1 

fe } 

Y 

f+2f+l 


.044 

-.058 

.72 

0 

y 

f+ir 

+ 

.8 

-.6 

\ e 2,M J 

V 


,-.22 

.29 

.6 

0, 

V 

V M \tJ 


v2 

■K 



Anytime you see a multiple of -,44Y r + ,58X ( as the leading term in a row, you can re-express that 
row using X, +1 , r Row 3 gives 


Y „ lf , =-0.1(-.44Y + .58X, + .8Y J + 0.8Y llf 

t+2\t+\ \ t t t+\\t' t+\\t 

= -0.1X f 1|f + 0.8Y llf 

t+\\t t+\\t 

and row 4 is X ) 2) | = 0.5X (|) + 0.2Y |( . This system results: 


f Y > 


"0 

0 

1 

0 ' 

f ^ ' 


n 

0 ' 

x, +1 


0 

0 

0 

1 

x, 


0 

1 

Y 

f+2f+l 

— 

0 

0 

.8 

-.1 

Y f+if 

+ 

.8 

-.6 

V 

^''N+2|0-l J 


,0 

0 

.2 

•5, 

v X, + j|, ; 


v2 

■K 


It is seen to be an ARMA(2,1) with A 2 = 0; in other words it is the vector ARMA(1,1) 

Y , = A i Y , i + E , - B i E , i with 


A: 


"o.8 -r 

v -2 0.5, 


and 


Vi 


".8 

v-2 



and Y t = (Y t ~X t ). It can be expressed as Y t = E, + (Aj - Bj)E M + \|/ 2 E r 2 + \|/ 3 E r _ 3 H—, and 
setting 



4-B, 





ro.8 ~.i\ 

bo 

J 

'ey 

"0 

• 5 1 

{ .2 0.5J 

I2 

0 

II 

a) 


So you have recovered the original ARMA(1,1) model. A useful feature of such a structure is that it 
sometimes gives a nice interpretation; for example, it is clear from the ARMA model that lagged 
shocks to Y do not have any effect on Y or X, while that is not so clear from the state space 
representation. 
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6.2 More Examples 


6.2.1 Some Univariate Examples 

Univariate models can also be expressed in state space form, and doing so provides some insights. 
Consider the AR(1) model where 
Y t - 100 = .6(y m - lOO) + e t 

Suppose you are given Y p Y,, Y,,. . . , Y 100 . You can forecast Y 101 , Y |(p , Y 103 , .... In fact, the only 
value you need to know is Y 100 . If Y 100 =150, then Y 101 = 130, Y 102 = 118, = 110.8, .... If you 

observe Y 101 =140 at time 101, the forecasts of Y 10 „ Y 103 , Y 104 , . . . change to Y 102 = 124, Y 1(13 =114.4, 
.... The point is that given the model, you need to know only the last Y, Y n to forecast as far into the 
future as you like. The forecasts are updated as new information is obtained. 

Consider the AR(2) model 

Y t - 100 = 1.2(Y M - 100) - .36(Y r _ 2 - 100) + e t 

Again, suppose you know Y p Y„ . . . , Y 100 with Y 10() =150. Knowing Y 100 is not enough to forecast 
Y 101 . You need more information. If you know Y 99 =l 10, then 

Y 101 = 100 + 1.2(50) - .36(l0) = 156.4 

and 

Y 102 = 100 + 1.2(56.4) - .36(50) = 149.68, ... 

In this example, you need to know two pieces of information: Y 99 = 110 and Y 100 =150, or Y 100 =150 and 
Y 101 = 156.4. Either pair of numbers allows you to forecast the series as far into the future as you 
like. The vector with the information you need is the state vector Z . 

For the AR(2) model, the state vector is 

z,=(y,-ioo,y, +1| ,-ioo)' 

where the prime symbol (') indicates the transpose of the row vector. Recall that Y f+A|f denotes the 
forecast of Y [+k given the data Y p Y„ . . . , Y ( . 
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Now 

Y, +1 „ - 100 = 1.2(Y, - 100) - ,36(Y m -100) 

for the AR(2) model. When the data at time t+ 1 become available, the state vector changes to 
L m = (y, +1 - 100, y, + 3, +1 - 100)' 

= (y m - 100, L2(Y m - 100) - ,36(Y, - 100))' 

Because 

Y, +1 - 100 = (Y r+1|r - lOO) + e M 
you can write 


z m - 


0 1 
-.36 1.2 


Z t + 


1 

1.2 


-t +1 


The first line of the matrix equation is simply 


Y+i 100 - Y r+1 u 100 + e t+l 


or 


Y, +1 - 100 = l.2(Y t - 100) - ,36(Y M - 100) + *, +1 

The last line of the matrix equation becomes 

Y, + 3, +1 - 100 = - 36(Y t - 100) + 1.2(Y f+1|f - 100) + 1.2e, +1 
= -.36 (y, - 100) + 1.2(Y f+1 -100) 

because 

Y, +1| , - 100 + e l+1 = Y (+1 - 100 

Two examples have been discussed thus far. In the first, an AR(1) model, the state vector was 


and 


Z t= Y t ~ 10° 


Z t +1 -6Zi t + e t+l 


In the second example, an AR(2) model, the state vector was 

z M =(y,-ioo,y, +1| ,-ioo)' 


Z t +1 - FZ r + GE r+i 


and 
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where 



0 1 " 


“ 1 “ 

F = 

-.36 1.2 

and G = 

1.2 


Every ARMA model has an associated state vector Z and an updating equation of the form 

Z/+i = FZ t + GE r +i 


To determine the state vector for a univariate ARMA time series Y ( , consider the sequence Y ( , Y (+1|f , 
Y (+2[ , .... At some point, such as k+l, you find that Y ;+(+||i is a linear combination of the previous 
sequence of elements Y f , Y (+1|f , . . . Y (+jt|( ; that is, 


Y = a Y 

G+r+ii/- u o l t 


a Y 

u i 1 r+l|r 


a k^t+k\t 


In the AR(2) example, 

Y, +2k - 100 = 1.2(Y, +1| , - 100) - .36<Y, - 100) 
so k=\. This determines the state vector as 



Y Y 

L t ? L t+\\t> 



Furthermore, any prediction of Y (+R|( with R >k is also a linear combination of state vector elements. 
Think of constructing the state vector by sequentially including forecasts Y v into Z ( until you reach 
the first forecast that is linearly dependent on forecasts already in Z . At that point, stop expanding 
Z f . Section 6.1.2 shows how this can be accomplished using canonical correlations. One more 
univariate example follows. 

Suppose 

Y t = et+.Se^i 
Then, because 
Y,+i| t = -8 e t 
and, for j> 1, 

Y ^=° 

(which is a„ Y, +(/., Y ( ,, with a n = (/., =0), the state vector has the following fonn: 


" Y, ' 


fYj 



1 

00 

_Li 


Z , +1 = 


Y, +1 

,8c 


^+1 


and 
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Note that 


Y, 


.8e, 


0 

1 

Y, 

+ 

0 

0 

00 



e 


M-l 


which is equivalent to the equation 

= e M + $ e t 

along with the identity 


•8<?, + i = •&?,+! 


Thus, 



"o l" 


"l" 

F = 

0 0 

andG = 

.8 


for the moving average (MA) model. 

The truly useful fact is that all multivariate ARMA models have state space equations. To construct 
the state vector for a multivariate process, you forecast each element of the process. For the bivariate 
process (X ( , Y f ), for example, consider the sequence X ( , Y ( , X (+1|( , Y (+||( , X (+2|( , Y f+2|/ , .... When you 
first reach a forecast of X (or Y) that is a linear combination of elements currently in the state vector, 
do not include that forecast or any future forecasts of that variable in the state vector. Continue 
including forecasts of the other variable in Z until you reach a point of linear dependence in that 
variable. You have seen an AR(2) of dimension 2 in Section 6.1.2. A bivariate ARMA(1,1) is shown 
next. 


6.2.2 ARMA(1,1) of Dimension 2 

Consider the model 


X " 


“.5 

.3" 

x.r 

+ 

e i ,t 


“.2 

.1" 


E l /-1 

X . 


_.3 

,5_ 

. X . 




_0 

0 _ 




or 


and 


Xf + .3Y r _j + s lr 


V = 'X + 5Y + 8 2 | 

from which 

\t = - 5X r + 3Y t - 2e u - . ls 2 r 


Y f+V = -3X, + .5Y f 
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and 


Xf+2|/- 


•3Y, +1| , = .5X, +1| , 


+ .09X, + .15Y, 


so 


Z r =(x r .Y r .X r+lk )' 

Finally, the state space form is 


' X, +1 


1 

o 

o 

1_ 


" x, " 

V 

= 

.3 .5 0 


Y, 

V 

^t+2\t+\_ 


.09 .15 .5 




0 

1 

.2 


6.3 PROC STATESPACE 

The general outline of PROC STATESPACE is as follows: 

1. For a multivariate or vector series, for example, 

X t = (X u ,X 2 j,Xij)' 

fit a multivariate AR model 

X t = AiXm + AoX/_2 + . . . + A iiX t -4f + E? 

You can do this row by row. That is, the regression of X u , on X, , ,, X 2 ., i. X 2 / 

XT, 2 . Xnj- 2 , Xij- 2 ,- ■ ■, X 2 / 1 ; produces the top rows of matrices Ai, A 2 , A 3 , . . ., A/,. Using 
X 2J and X 2/ as dependent variables in the regression produces the second and third rows of 
the A matrices. This is essentially what is done in PROC STATESPACE. To decide on k, 
you use a version of Akaike's information criterion (AIC). This criterion is 

AIC = -2LOG(maximized likelihood) + 2(number of parameters in the model) 

Note that AIC is made smaller by a decrease in the number of model parameters or an 
increase in the likelihood function. Thus, it trades off precision of fit against the number of 
parameters used to obtain that fit. Select k to minimize AIC. 

2. The model is now called a vector autoregression of order k, and you have a measure, AIC, of 
its fit. The question becomes whether the fit can be improved by allowing MA terms and 
setting to 0 some of the elements of the A matrices. In other words, search the class of all 
vector ARMA models that can be reasonably approximated by a vector autoregression of 
order k. The smallest canonical correlation is used to assess the fit of each model against the 
fit of the preliminary vector autoregression. 
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For example, a vector ARMA(1,1) of dimension 3 can be 


" x u~ 


".72 

0 

o " 


" x u-i~ 


e u 


".4 

.3 

.1 


e \,t- 1 

x 2 ,, 

= 

0 

.6 

.4 


X 2,,-l 

+ 

e 2,t 

- 

0 

.2 

.1 


e 2,t-\ 

X 3,r 


0 

.4 

.8 


X 3,M 


_ e Xt_ 


0 

0 

.4 


_ e 2,t-\ _ 


or 

Xf = AX , : + E, - BE, , 

Check to see if this model fits as well as the original vector autoregression. If k= 4, for 
example, the original autoregression contains four A matrices, each with nine parameters. If 
the vector ARMA(1,1) fits about as well as the vector AR(4) in likelihood, the inherent 
penalty in the information criterion for a large number of parameters can make the 
information criterion for the vector ARMA( 1,1) smaller than for the vector AR(4), and thus 
the difference will be negative. An information criterion for model selection based on this 
idea is called DIC in PROC STATESPACE. 

3. The comparison in step 2 is easier than it first appears. All vector ARMA models can be 
expressed in state space form. Thus, comparing state space models and determining the best 
model is equivalent to finding the dimension of the best model's state vector Z,, because all 
state space models have the same basic form, 

h t = FZ, i + GE, 

The key to this decision is an organized sequential formulation of the state vector. Start by 
including X lf , X,,, and X 3 ,. Next, check X li+1| ,to see if it is a linear combination of X 1; , X, ; , 
and X 3 If it is, it provides no new information and is not added to the state vector. 

Otherwise, the state vector is augmented to (X u , X,,, X 3; , X li+1| ,). 

The next question is whether X, , +|| , should be included in the state vector. Include it only if it 
cannot be written as a linear combination of elements already in the state vector. The state 
vector is formulated sequentially in this fashion. Suppose X, , +1|( is included and both X, , +|| , 
and X 3i+Il , have been tested. Next, consider testing X, f+2|( for inclusion in the state vector. If 
X, , +2| , is not included in the state vector, pioneering work by Akaike shows that X, is not 
included for any j> 2. That is, if a forecast X , +t| , is a linear combination of elements already in 
the state vector, X, also is such a linear combination for any j>k. At this point, stop 
considering the forecast of X |5 but continue to consider forecasts of X, and X 3 (unless X,, +|| , 
or X 3 ;+ll , was found earlier to be a linear combination of elements already in the state vector) 
and continue in this fashion. 
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For this example, the state vector may be 

Z? = (X u , Xu, Xj^, X lt+1 | ( , Xj^+u, Xi^^y 

In this case, X u+3 | 2 , X 2 , iand X 2/2 , are linear combinations of elements already present in 
the state vector. 


4. PROC STATESPACE uses the initial vector AR(£) approximation to estimate covariances 
between the elements of Z, , and Z t . It uses these in a manner similar to the Yule-Walker 
equations to compute initial estimates of the elements of F and G. Recall that the model, in 
state space form, is 

Z, i = FZ, + GE, i 

for any underlying multivariate vector ARMA model. Assuming E, is a sequence of 
independent normal random vectors with mean 0 and variance-covariance matrix E, you can 
write out the likelihood function. Start with the initial estimates, and use a nonlinear search 
routine to find the maximum-likelihood (ML) estimates of the parameters and their 
asymptotically valid standard errors. 


You can obtain an estimate, E, of E from the sums of squares and crossproducts of the one- 
step-ahead forecast errors of each series. Such a multivariate setting has several error 
variances (these are the diagonal elements of E). A general measure of the size of E is its 


determinant 



which can be printed out in PROC STATESPACE. This determinant should 


be minimized because it is a general measure of prediction error variance. 


5. Because the first few elements of the state vector make up the multivariate series to be 
forecast, use the state space equation to forecast future values Z t+k and then extract the first 
elements. These are the forecasts of the multivariate series. In addition, the state space 
equation yields the prediction error variances. Consider, for example, forecasting three 
periods ahead. Now 

Z, 2 = FZ, 2 + GE, = F Z, i + FGE, 2 + GE, 2 
= F-'Z, + (F 2 GE m + FGE, 2 + GE,..,) 


If the original vector process has the three elements X u , X 2 / . and X 2 , considered before, the 
forecasts X u+3 , X 2 / 2 . and X 2 / 2 are the first three elements of F’Z,. The variance-covariance 
matrix of the quantity in parentheses contains the prediction-error, variance-covariance 
matrix. It can be estimated from estimates of F, G, and E. 


6. In PROC STATESPACE, an output data set is created that contains forecasts, forecast 

standard errors, and other information useful for displaying forecast intervals. The theoretical 
result that allows this process to be practical is the use of canonical correlations to accomplish 
step 3. Akaike (1976) showed how this can be done. Note that up to this point, state vectors 
have been derived only for known models. In practice, you do not know the vector ARMA 
model form, and you must use the canonical correlation approach (see Section 6.3.2) to 
compute the state vector. 
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6.3.1 State Vectors Determined from Covariances 

PROC STATESPACE computes the sequence of information criterion values for 
k= 1, 2, , 10 (or ARMAX) 

and selects the model that gives the minimum. 

This vector AR model for the time series is used to compute a variance-covariance matrix M 
between the set of current and past values and between the set of current and future values. 

These two facts are relevant: 

1. All predictions are linear combinations of the observations of Y ( , Y p Y 2 , . . . where for 
practical puiposes this list can be truncated at Y ,, as determined by the initial 

vector autoregression. 

2. The covariance between a prediction Y f+ ^. |f and a current or past value Y ( . is the same as 
that between Y . and Y .. 

t+j t-i 

Akaike uses these facts to show that analyzing the covariances in the matrix M is equivalent to 
determining the form of the state vector. Canonical correlations are used in this case, and the 
elements of M are replaced by their sample values. 


6.3.2 Canonical Correlations 


Suppose the covariance matrix between the set of current and past values Y f , Y ; p Y 2 and the set of 
current and future values Y, Y p Y f+2 for a univariate series is given by 


M = 


8 4 
4 2 
2 1 


2 

1 

.5 


Note that there are no zero correlations. (You will find, however, that some canonical correlations 
are zero.) 

Canonical correlation analysis proceeds as follows: 

1. Find the linear combination of elements in the first vector (Y, Y _ p Y 2 ) and second vector 
(Y, Y j, Y 2 ), with maximum cross-correlation. This canonical correlation is the largest, and 
the linear combinations are called canonical variables. In the example, Y and Y are perfectly 
correlated, so (1, 0, 0)(Y f , Y f _ p Y f 2 )' and (1, 0, 0)( Y f , Y f+1 , Y f+2 )' are the canonical variables. 

2. Now consider all linear combinations of elements in the original vectors that are not correlated 
with the first canonical variables. Of these, the two most highly correlated give the second- 
highest canonical correlation. In this case, you can show that the next-highest canonical 
correlation is, in fact, zero. 

3. At each stage, consider linear combinations that are uncorrelated with the canonical variables 
found thus far. Pick the two (one for each vector being analyzed) with the highest cross¬ 
correlation. 
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Akaike establishes that the following numbers are all the same for general vector ARMA 
models: 

□ the rank of an appropriate M 

□ the number of nonzero canonical correlations 

□ the dimension of the state vector. 

When you look at the example matrix M, you see that the covariance between Y r _ k and Y f+ . is always 

j+k . 

8(,5 ) for j.k>0. Thus, M is the covariance matrix of an AR(1) process. All rows of M are direct 
multiples of the first, so M has rank 1. Finally, the canonical correlations computed from M are 
1 , 0 , 0 . 

When the general sample covariance matrix M is used, the analysis proceeds as follows (illustration 
for bivariate series (X, Y)'): 

1. Determine the number of lags into the past (X, Y f , X M , Y M , . . . , Y .). 

2. Do a canonical correlation of (X, Y) with current and past values. This produces correlations 

u. 

3. Next, do a canonical correlation analysis of (X f , Y, X f +1| ) with current and past values of X f , 

Y from step 1. 

4. a) If the smallest canonical correlation is not close to zero, include X f +||; in the state vector and 

analyze (X f ,Y f ,X f+1|f ,Y f+1|f ). 

b) If the smallest canonical correlation is close to zero, exclude from consideration X f+1|( and 
all X . |( for j>l. In this case, the analysis of (X f , Y ( , Y r+1|f ) is next. 

5. Continue until you have determined the first predictions, X f+ . |r and Y m|) , that introduce zero 

canonical correlations. Then X f+ ._ ||( and Y (+s _ ||( are the last predictions of X and Y to be 
included in the state vector. 

PROC STATESPACE executes this procedure automatically. The sample canonical correlations are 
judged by the aforementioned DIC. A chi-square test statistic attributed to Bartlett (1947) is 
computed. The significance of Bartlett’s statistic indicates a nonzero canonical correlation. Robinson 
(1973) suggests potential problems with Bartlett’s test for MA models. Thus, DIC is used as the 
default criterion. 

PROC STATESPACE uses the estimated covariance matrix and the identified state vector to 
compute initial estimates of matrices F and G in the state space representation. The advantage of 
PROC STATESPACE is its automatic identification of a model and preliminary parameter 
estimation, but the user is responsible for any transformation necessary to produce stationarity and 
approximate normality. Also note that the STATESPACE theory does not include deterministic 
components like polynomials in time. 

Use the NOEST option to view the preliminary model before fine-tuning the parameter estimates 
through nonlinear iterative least squares (LS) or ML estimation. 

You may want to use the RESTRICT statement to set certain elements of F and G to zero. (You have 
seen several cases where F and G contain zeros.) 
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6.3.3 Simulated Example 

To see how PROC STATESPACE works with a known univariate model, consider 100 observations 
from an MA(1) model 


Note that the model can be re-expressed as 

Y = (.8Y m - -64Y ( 2 + (.8) 3 Y,_3 - (.8)%,- ..) + «, 

Thus, the initial AR approximation should have coefficients near .8, -.64, .512, -.4096, .... Use the 
following SAS statements for the analysis: 

PROC STATESPACE CANCORR ITPRINT DATA=TEST2; 

VAR Y; 

RUN; 


As shown in Output 6.4, the CANCORR option displays the sequential construction of the state 
vector. The ITPRINT option shows the iterative steps of the likelihood maximization. 

In Output 6.4, observe the sample mean, Y O, and standard deviation and the sequence of AICs © 
for up to ten AR lags. The smallest AIC in the list is 9.994428 ©, which occurs at lag 4. Thus, the 
initial AR approximation involves four lags © and is given by 

Y ( = .79Y (1 - .58Y ( 2 + .31Y ( _ 3 - .24Y ( _ 4 + e t 

This corresponds reasonably well with the theoretical results. Schematic representations of the 
autocorrelation function (ACF) © and partial autocorrelation function (PACF) © are also given. A 
plus sign (+) indicates a value more than two standard errors above 0, a period (.) indicates a value 
within two standard errors of 0, and a minus sign (-) indicates a value more than two standard errors 
below 0. Based on results from Chapter 3, “The General ARIMA Model,” you would expect the 
following sequences of + and - signs in the theoretical ACF and PACF plots. 

FAG 0123456789 10 

ACF + + . 

PACF + - + - + - + - + - + 


You also would expect the estimated PACF to drop within two standard errors of 0 after a few lags. 
The estimated functions correspond fairly well with the theoretical functions. 

Note the canonical correlation analysis ©. Initially, consideration is given to adding Yto the state 
vector containing Y. The canonical correlation, 0.454239 ©, is an estimate of the second-largest 
canonical correlation between the set of variables (Y, Y ) and the set of variables (Y, Y , Y ( _ 2 , 

Y , Y 4 ). The first canonical correlation is always 1 because both sets of variables contain Y ; . The 
question is whether 0.4542 is an estimate of 0. PROC STATESPACE concludes that a correlation is 
0 if DIC<0. In this case, DIC=15.10916 ©, so 0.4542 is not an estimate of 0. This implies that the 
portion of Y that cannot be predicted from Y is correlated with the past of the time series and, 
thus, that Y should be included in the state vector. Another test statistic, Bartlett’s test, is 
calculated as 22.64698 ©. The null hypothesis is that the second-highest canonical correlation is 0 
and the test statistic is to be compared to a chi-squared table with four degrees of freedom ®. The 
hypothesis of zero correlation is rejected, and Bartlett’s test agrees with DIC to include Y f+1|f in the 
state vector. 
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Output 6.4 Modeling Simulated Data in PROC STATESPACE with the CANCORR and 
ITPRINT Options 


The STATESPACE Procedure 
Number of Observations 100 


© Variable 

Mean 

Standard 

Error 

Y 

-0.40499 

1.322236 


© Information Criterion for Autoregressive Models 

Lag=0 Lag=1 Lag=2 Lag=3 Lag=4 Lag=5 Lag=6 Lag=7 Lag=8 Lag=9 

© 

55.86487 27.05937 13.52724 13.8878 9.994428 11.95021 12.7582 10.71094 12.64703 14.5518 

Information 
Criterion for 
Autoregressive 
Models 
Lag=10 

16.48957 


Schematic Representation of Correlations 

© 

Name/Lag 01 23456789 10 

Y + + ...... + . 

+ is > 2*std error, - is < -2*std error, . is between 


Schematic Representation of Partial Autocorrelations 

© 

Name/Lag 123456789 10 

Y + - . -. 

+ is > 2*std error, - is < -2*std error, . is between 


Yule-Walker Estimates for Minimum AIC 

© --Lag=1- --Lag=2- --Lag=3- --Lag=4- 

Y Y Y Y 

Y 0.789072 -0.58225 0.308989 -0.23923 
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Output 6.4 Modeling Simulated Data in PROC STATESPACE with the CANCORR and 
ITPRINT Options (continued) 


e 

Canonical 

Correlations Analysis 




Information 

Chi 


Y(T ;T) 

Y(T+1;T) 

Criterion 

Square 

DF 

1 © 

0.454239 

© 15.10916 

© 22.64698 

® 4 


Information Chi 

Y(T;T) Y(T+1;T) Y(T+2;T) Criterion Square DF 

1 0.489796 © 0.208571 © -1.55238 © 4.38091 © 3 

Selected Statespace Form and Preliminary Estimates 
State Vector 

© Y(T;T) Y(T+1;T) 

Estimate of Transition Matrix 

0 1 

-0.16297 0.290093 

© Input Matrix for Innovation 
1 

0.789072 

Variance Matrix for Innovation 

1.020144 

Iterative Fitting: Maximum Likelihood Estimation 
Iter Half Determinant Lambda F(2,1) F(2,2) G(2,1) Sigma(1,1) 

0 0 1.040261 0.1 -0.1629742 0.29009252 0.78907196 1.04026133 

1 0 1.025047 0.01 -0.1157232 0.20149505 0.83329356 1.02504685 

2 0 1.020305 0.001 -0.0570093 0.12653818 0.84215994 1.02030521 

3 0 1.020245 0.0001 -0.051494 0.11987985 0.8409064 1.02024527 

4 2 1.020245 © 0.001 © -0.0515121 0.1198994 0.84093914 1.02024507 
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Output 6.4 Modeling Simulated Data in PROC STATESPACE with the CANCORR and 
ITPRINT Options (continued) 


Maximum likelihood estimation has converged. 

Selected Statespace Form and Fitted Model 
State Vector 
Y(T;T) Y(T+1;T) 

© Estimate of Transition Matrix 

0 1 
-0.05151 0.119899 

© Input Matrix for Innovation 
1 

0.840939 

Variance Matrix for Innovation 

1.020245 



Parameter 

Estimates 




Standard 

© 

Parameter 

Estimate 

Error 

t Value 

F(2,1) 

-0.05151 

0.132797 

-0.39 

F(2,2) 

0.119899 

0.151234 

0.79 

G(2,1 ) 

0.840939 

0.099856 

8.42 


Now consider the portion of Y that you cannot predict from Y and Y . If this portion is 
correlated with the past of the series, you can produce a better predictor of the future than one that 
uses only Y f and Y ^. Add Y to the state vector unless the third-highest canonical correlation 
between the set (Y ( , Y (+1 , Y (+2 ) and the set (Y f , Y (| , Y t 2 , . . . , Y f 4 ) is 0. The estimate of the third- 
highest canonical correlation is 0.208571 ©. PROC STATESPACE assumes that 0.208571 is just an 
estimate of 0 because DIC is negative (-1.55238) ®. This means that once you have predicted Y f+2 
from Y f and Y (+1|f , you have the best predictor available. 

The past data do not improve the forecast. Thus, Y (+2[( is not added to the state vector. Bartlett's test 
statistic, 4.38091 ©, is not significant compared to a chi-squared table with three degrees of 
freedom © (with a critical value of 7.81). 
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Again, the two tests agree that Y (+2|( is a linear combination of Y f and Y (+1|( . Thus, the only information 
you need to predict arbitrarily far into the future is in 

Z, = (Y„ Y (+ll( )' 

When you compare this to the theoretical analysis of an MA(1), you see that PROC STATESPACE 
has correctly identified the state vector as having two elements. The theoretical analysis gives the 
state space representation as 

l" 

.8 £,+l 

PROC STATESPACE estimates these matrices to be 



0 

1 


" 1 " 

F = 

-.163 

.290 

and G = 

.79 


initially ©® and 



0 

1 “ 


1 

F = 

-.051 

.12 

and G = 

.841 


Zt+ i - 


0 1 
0 0 


finally 

Note that the t statistics 0 on F (2,1) and F (2,2) are, as expected, not significant. The true entries of 
F are zeros in those positions. Finally, observe the nonlinear search © beginning with the initial 
values in Output 6.4 ©® and then moving to the final values in Output 6.4 ©©. Note that | L | 
decreases at each step 0. 

To force the correct form on the matrix F, use the RESTRICT statement: 

PROC STATESPACE ITPRINT COVB DATA=TEST2; 

RESTRICT F(2,1)=0 F(2,2)=0; 

VAR Y; 

RUN; 

The RESTRICT statement may also include restrictions on the entries of G. (See Output 6.5.) 

As requested, the bottom row of F has been set to 0 0 O. The initial G matrix ©and the final G 
matrix © are close to the theoretical matrix, namely G=(l .8)'. The COVB option requests the 
variance-covariance matrix of the parameter estimates ©, which is a scalar in the case of a single 
parameter estimate. 



312 SAS for Forecasting Time Series 


Output 6.5 Modeling Simulated Data in PROC STATESPACE with the RESTRICT Statement 

The STATESPACE Procedure 
Number of Observations 100 




Standard 

Variable 

Mean 

Error 

Y 

-0.40499 

1.322236 


Information Criterion for Autoregressive Models 

Lag=0 Lag=1 Lag=2 Lag=3 Lag=4 Lag=5 Lag=6 Lag=7 Lag=8 Lag=9 

55.86487 27.05937 13.52724 13.8878 9.994428 11.95021 12.7582 10.71094 12.64703 14.5518 

Information 
Criterion for 
Autoregressive 
Models 

Lag=10 

16.48957 

Schematic Representation of Correlations 
Name/Lag 01 23456789 10 

Y + + ...... + . 

+ is > 2*std error, - is < -2*std error, . is between 

Schematic Representation of Partial Autocorrelations 
Name/Lag 123456789 10 

Y + - . -. 

+ is > 2*std error, - is < -2*std error, . is between 

Yule-Walker Estimates for Minimum AIC 

--Lag=1- --Lag=2- --Lag=3- --Lag=4- 

Y Y Y Y 

Y 0.789072 -0.58225 0.308989 -0.23923 
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Output 6.5 Modeling Simulated Data in PROC STATESPACE with the RESTRICT Statement 
(continued) 

Selected Statespace Form and Preliminary Estimates 

State Vector 
Y(T;T) Y(T+1;T) 

Estimate of Transition Matrix 

0 1 

O 0 0 

Input Matrix for Innovation 
1 

© 0.789072 

Variance Matrix for Innovation 
1.020144 

Iterative Fitting: Maximum Likelihood Estimation 


Iter 

Half 

Determinant 

Lambda 

G (2,1 ) 

Sigma(1 ,1) 

0 

0 

1.02648 

0.1 

0.78907196 

1.0264803 

1 

0 

1.026474 

0.01 

0.77926993 

1.02647358 

2 

0 

1.026445 

0.001 

0.78027074 

1.02644522 


WARNING: No improvement after 10 step halvings. Convergence has 
been assumed. 

Selected Statespace Form and Fitted Model 
State Vector 
Y(T;T) Y(T+1;T) 

Estimate of Transition Matrix 

0 1 

0 0 

Input Matrix for Innovation 

1 

© 0.780271 

Variance Matrix for Innovation 


1.026445 
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Output 6.5 Modeling Simulated Data in PROC STATESPACE with the RESTRICT Statement 
(continued) 


Parameter Estimates 

Standard 

Parameter Estimate Error t Value 

G(2,1) 0.780271 0.062645 12.46 

Covariance of Parameter Estimates 

G(2, 1 ) 

© G(2,1) 0.0039244 

Correlation of Parameter Estimates 
G(2,1) 

G(2,1) 1.00000 


You can find other options for PROC STATESPACE in the SAS/ETS User's Guide. 

It is dangerous to ignore the autocorrelations. The theory behind PROC STATESPACE assumes the 
input series are stationary. You have no guarantee of a reasonable result if you put nonstationary 
series into PROC STATESPACE. Often, you see almost all plus signs in the ACF diagram, which 
indicates a very slow decay and, consequently, possible nonstationarity. Differencing is specified 
exactly as in PROC ARIMA. For example, the following SAS statements specify a first and span 
12 difference to be applied to Y: 

PROC STATESPACE; 

VAR Y(l,12); 

RUN; 

The FORM statement is used to specify a form for the state vector. This statement can be helpful if 
you want to specify a state vector different from what DIC automatically chooses (for example, 
Bartlett’s test may give a different result than DIC, and you may prefer Bartlett's test). For example, 
the statements 

PROC STATESPACE; 

VAR X Y; 

FORM X 2 Y 1; 

RUN; 

specify the state vector as 

z, = (x, Y„x, +ll ,y 

Now consider an interesting data set that cannot be modeled correctly as a transfer function because 
of feedback. The data are counts of mink and muskrat pelts shipped to Europe from Canada by the 
Hudson's Bay Company. The logarithms are analyzed, and both the logarithms and the original data 
are plotted in Output 6.6. 
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You have an increasing, seemingly linear trend in the data plots. PROC REG is appropriate for 
detrending the data if the trend is due to increased trapping and does not reveal anything about the 
relationship between these two species. In that case, the dynamic relationship between mink 
(predator) and muskrat (prey) is best displayed in residuals from the trend. Another approach is to 
difference the two data series and analyze the resulting changes in pelt numbers. The approach you 
choose depends on the true nature of the series. The question becomes whether it is a unit root 
process or a time trend plus stationary error process. The regression detrending approach is used here 
simply to display the technique and is not necessarily recommended over differencing. The following 
SAS code detrends the logged data and submits the detrended data (residuals) to PROC 
STATESPACE for analysis: 

DATA DETREND; 

SET MINKMUSK; 

T+l ; 

RUN; 

PROC REG DATA=DETREND NOPRINT; 

MODEL LMINK LMUSKRAT=T; 

OUTPUT OUT=RESID R=RMINK RMUSKRAT; 

RUN; 

PROC STATESPACE NOCENTER DATA=RESID; 

VAR RMINK RMUSKRAT; 

TITLE 'HUDSON"S BAY FUR TRAPPING RECORDS 1842-1890'; 

TITLE2 'RESIDUALS FROM LINEAR TREND’; 

RUN; 

The results are shown in Output 6.7. 

Because the data are detrended, you do not need to subtract the mean. Thus, you can specify 
NOCENTER. Note that the ACF O schematic plot shows several plus and minus signs but not 
enough to indicate nonstationarity. (However, it is notoriously difficult to detect nonstationarity 
visually in a series that has been detrended.) 

Note that the ACF at lag 1 is represented by a matrix of plus and minus signs because you have a 
bivariate series. If you consider a bivariate series in general as (X, Y) and the lag 1 matrix 

Xm Y m 
X; + + 

y, L - + 

then the + in the upper-left corner indicates a positive covariance between X and X . The + in the 
upper-right corner indicates a positive covariance between X and Y . The - in the lower-left corner 
indicates a negative covariance between Y and X and, finally, the + in the lower-right corner 
indicates a positive covariance between Y and Y In terms of the current example, X f represents 
RMINK, and Y represents RMUSKRAT, so the signs make sense with respect to the predator-prey 
relationship. 

The PACF © looks like that of a vector AR of dimension 2 and order 1 (one lag). Thus, you expect 
the initial AR approximation © to have only one lag and to be very close to the final model © 
chosen by PROC STATESPACE. This is, in fact, the case here. 
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The state vector © is simply the vector of inputs, so the vector ARMA model is easily derived from 
the state space model ©. When X=RMINK (mink residuals at time t) and Y=R MUSKRAT (muskrat 
residuals at time t) are used, the state vector is simply 

Z, = (X,, Y,)' 

and the model is 

RMINK rx, +1 l r .569 .298] [X,] |~c u+1 ~ 

RMUSKRAT |_Y /+1 J ” [-.468 .627j |_Y,J + \e 2M 

Here, for example, the number -.468 indicates that large mink values (predator) at time t are 
associated with small muskrat values (prey) at time t+ 1. X ( and Y t are not related by a transfer 
function because you can use the t statistic © to reject the hypothesis that -.468 is an estimate of 0. 
That is, each series is predicted by using lagged values of the other series. The transfer function 
methodology in PROC ARIMA is not appropriate. 

Output 6.7 Using PROC REG to Detrend the Data and PROC STATESPACE to Analyze the Residuals 

HUDSON'S BAY FUR TRAPPING RECORDS 1842-1890 
RESIDUALS FROM LINEAR TREND 
The STATESPACE Procedure 

Number of Observations 49 

Standard 
Variable Error 

RMINK 0.367741 

RMUSKRAT 0.397009 

HUDSON'S BAY FUR TRAPPING RECORDS 1842-1890 
RESIDUALS FROM LINEAR TREND 
The STATESPACE Procedure 

Information Criterion for Autoregressive Models 

Lag=0 Lag=1 Lag=2 Lag=3 Lag=4 Lag=5 Lag=6 Lag=7 Lag=8 Lag=9 

-188.624 -251.778 -245.705 -242.548 -236.654 -230.49 -229.25 -227.127 -225.203 -221.891 

Information 
Criterion for 
Autoregressive 
Models 

Lag=10 


-221 .112 
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Output 6.7 Using PROC REG to Detrend the Data and PROC STATESPACE to Analyze the Residuals 
(continued) 

Schematic Representation of Correlations 

Name/Lag 01 23456789 10 

RMINK +. ++ .+ .+ +- +. 

RMUSKRAT .+ -+ -. +. +. +. O 

+ is > 2*std error, - is < -2*std error, . is between 


Schematic Representation of Partial Autocorrelations 

Name/Lag 1 23456789 10 

RMINK +. 

RMUSKRAT -+ .© 

+ is > 2*std error, - is < -2*std error, . is between 


Yule-Walker Estimates 
for Minimum AIC 

.Lag=1. 

RMINK RMUSKRAT 

RMINK 0.568749 0.298262 © 

RMUSKRAT -0.46839 0.627158 

HUDSON'S BAY FUR TRAPPING RECORDS 1842-1890 
RESIDUALS FROM LINEAR TREND 
The STATESPACE Procedure 

Selected Statespace Form and Preliminary Estimates 
© State Vector 

RMINK(T;T) RMUSKRAT(T;T) 


Estimate of Transition Matrix 

0.568749 0.298262 

© -0.46839 0.627158 


Input Matrix for Innovation 


1 

0 


0 

1 







320 SAS for Forecasting Time Series 


Output 6.7 Using PROC REG to Detrend the Data and PROC STATESPACE to Analyze the Residuals 
(continued) 

Variance Matrix for Innovation 

0.079131 0.002703 

0.002703 0.063072 


Maximum likelihood estimation has converged. 

HUDSON'S BAY FUR TRAPPING RECORDS 1842-1890 
RESIDUALS FROM LINEAR TREND 
The STATESPACE Procedure 
Selected Statespace Form and Fitted Model 

State Vector 

RMINK(T;T) RMUSKRAT(T;T) 


Estimate of Transition Matrix 

0.568749 0.298262 

-0.46839 0.627158 


Input Matrix for Innovation 

1 0 

0 1 


Variance Matrix for Innovation 

0.079131 0.002703 

0.002703 0.063072 


Parameter Estimates 




Standard 

© 

Parameter 

Estimate 

Error 

t Value 

F(1 ,1) 

0.568749 

0.109253 

5.21 

F(1,2) 

0.298262 

0.101203 

2.95 

F(2,1) 

-0.46839 

0.097544 

-4.80 

F(2,2) 

0.627158 

0.090356 

6.94 
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If you had (mistakenly) decided to fit a transfer function, you could have fit an AR(2) to the mink 
series and computed the prewhitened cross-correlations. You observe a somewhat subtle warning in 
the cross-correlations plot—namely, that there are nonzero correlations at both positive and negative 
lags as shown in Output 6.8. 


Output 6. 8 Cross-Correlations 


HUDSON'S BAY FUR TRAPPING RECORDS 1842-1890 

The ARIMA Procedure 


Cross-Correlations 


Lag 

Covariance 

Correlation 

-1 

9876543210123 

-10 

0.0023812 

0.02848 

1 

1 * 

-9 

-0.015289 

- .18288 

1 

* * * * I 

-8 

-0.033853 

- .40494 

1 

******** 1 

-7 

-0.0089179 

- .10668 

1 

* * 1 

-6 

-0.0012433 

- .01487 

1 


-5 

-0.0010694 

- .01279 

1 


-4 

0.0089324 

0.10685 

1 

1 * * 

-3 

0.018517 

0.22149 

1 

1 * * * * 

-2 

-0.0007918 

- .00947 

1 


-1 

0.022822 

0.27299 

1 

1 ***** 

0 

0.010153 

0.12145 

1 

1 * * 

1 

-0.040174 

- .48055 

1 

********** 1 

2 

-0.0095530 

- .11427 

1 

* * 1 

3 

-0.0090755 

- .10856 

1 

* * 1 

4 

-0.0001946 

- .00233 

1 


5 

0.0048362 

0.05785 

1 

1* 

6 

0.012202 

0.14596 

1 

1 * * * 

7 

0.013651 

0.16329 

1 

1 * * * 

8 

0.010270 

0.12284 

1 

1 * * 

9 

0.015405 

0.18427 

1 

1 * * * * 

10 

-0.0067477 

- .08071 

1 

* * 1 


marks two standard errors 


Both variables have been prewhitened by the following filter: 
Prewhitening Filter 
Autoregressive Factors 

Factor 1: 1 - 0.78452 B**(1) + 0.29134 B**(2) 
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7.1 Periodic Data: Introduction 

The modeling of time series data using sinusoidal components is called spectral analysis. The main 
tool here is the periodogram. A very simple model appropriate for spectral analysis is a mean plus a 
sinusoidal wave plus white noise: 

Y t = p + a(sin(cof + 8)) + e t = p + a(sin(S) cos(cof) + cos(S) sin(cof)) + e t 

where the formula, sin(A + B) = cos(A)sin(B) + sin(A)cos(B) for the sine of the sum of two angles, 
has been applied. The function p + a(sin(cof + 8)) oscillates between p - a and p + a in a smooth and 
exactly periodic fashion. The number a is called the amplitude. The number 8, in radians, is called 
the phase shift or phase angle. The number to is called the frequency and is also measured in 
radians. If an arc of length r is measured along the circumference of a circle whose radius is r, then 
the angle obtained by connecting the arc’s ends to the circle center is one radian. There are 2k 
radians in a full 360-degree circle, and one radian is thus 360 /(2k) = 360/6.2832 = 57.3 degrees. A 
plot of p + a(sin(cof + 8)) versus t is a sine wave that repeats every 2k/ time units; that is, the 
period is 2jt/go. A sinusoid of period 12 would “go through” to = 2k/\2 = 0.52 radians per 
observation. 

Letting A = asin(S) and B = acos(S), we see that 
Y t = p + A cos(cof) + B sin(cof) + e t 
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This is a very nice expression in that, if to is known, variables sin(cof) and cos(cof) can be 
constructed in a DATA step and the parameters p, A, and B can be estimated by ordinary least 
squares as in PROC REG. From the expressions for A and B it is seen that B/A = tan(S) and 

VA 2 + B 2 =aVcos 2 (5) + sin (8) = a, so phase angle and amplitude estimates can be constructed 
from estimates of A and B. 


7.2 Example: Plant Enzyme Activity 

As an example, Chiu-Yueh Hung, in the Department of Genetics at North Carolina State University, 
collected observations on leaf enzyme activity Y every 4 hours over 5 days. There are 6 observations 
per day and 30 observations in all. Each observation is an average of several harvested leaves. The 
researcher anticipated a 12-hour enzyme cycle, which corresponds to 3 observations. To focus this 
discussion on periodic components, the original data have been detrended using linear regression. 

First read in the data, creating the sine and cosine variables for a period 3 (frequency 2tc / 3 cycles 
per observation), and then regress Y on these two variables. 

DATA PLANTS; 

TITLE "ENZYME ACTIVITY"; 

TITLE2 "(DETRENDED)"; 

DO T=1 TO 30; INPUT Y @@; PI = 3.1415926; 

S1=SIN(2*PI*T/3); Cl=COS(2*PI*T/3); 

OUTPUT; 

END; 

CARDS; 


265.945 

290.385 

251.099 

285.870 

379.370 

301.173 

283.096 

306.199 

341.696 

246.352 

310.648 

276.348 

234.870 

314.744 

261.363 

321.780 

313.289 

253.460 

307.988 

303.909 

284.128 

252.886 

317.432 

287.160 

213.168 

308.458 

296.351 

283.666 

333.544 

316.998 


/ 

RUN; 

PROC REG DATA=PLANTS; 

MODEL Y = SI Cl/SSI; 

OUTPUT OUT=OUT1 PREDICTED=P RESIDUAL=R; 
RUN; 


The analysis of variance table is shown in Output 7.1. 
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Output 7.1 

Plant Enzyme 

Sinusoidal 

Model 





The REG 

Procedure 





Model: 

M0DEL1 





Dependent 

Variable: Y 





Analysis of Variance 





Sum of 

Mean 


Source 


DF 

Squares 

Square F Value Pr > F 

Model 


2 

11933 

5966.44591 7 

.09 0.0034 

Error 


27 

22724 

841.62128 


Corrected 

Total 

29 

34657 




Root 

MSE 

29.01071 R-Square 

0.3443 


Dependent Mean 

291.44583 Adj R-Sq 

0.2957 


Coeff 

: Var 

9.95407 





Parameter 

Estimates 




Parameter 

Standard 



Variable 

DF 

Estimate 

Error 

t Value Pr > |t| 

Type I SS 

Intercept 

1 

291.44583 

5.29661 

55.03 <.0001 

2548220 

si 

1 

-27.84889 

7.49053 

-3.72 0.0009 

11633 

cl 

1 

-4.46825 

7.49053 

-0.60 0.5558 

299.47884 


The sum of squares for the intercept is nY 2 = 30(291.44583 2 ) = 2548220, and the sum of squares for 
the model, which is the sum of squares associated with frequency to = 2n/3, is 11933 and has 2 
degrees of freedom. It is seen to be statistically significant based on the F test, F = 7.09 (P = .0034). 
It appears that the sine term is significant but not the cosine term; however, such a splitting of the 
two degree of freedom sum of squares is not meaningful in that, if t = 0 had been used as the first 
time index rather than t = 1, both would have been significant. The sum of squares 11933 would not 
change with any such time shift. The sum of squares 11933 associated with frequency to = 2n/3 is 
called the periodogrcim ordinate at that frequency. A given set of data may have important 
fluctuations at several frequencies. Output 7.2 shows the actual and fitted values for the plant 
enzyme data. 
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Output 7.2 

Data and 

Sinusoidal 

Predictions 



7.3 PROC SPECTRA Introduced 

Periodogram ordinates are calculated for a collection of frequencies known as the Fourier 
frequencies. With 2m +1 observations Y, there are m of these, each with 2 degrees of freedom, so 
that a multiple regression of Y on the 2m sine and cosine columns fits the data perfectly; that is, 
there are no degrees of freedom for error. The Fourier frequencies are (2nj / n), where j runs from 1 
to m. For each j, two columns sin(2njt/n) and cos(2 njt/n), i = 1. 2....n. are created. The model 
sum of squares, when the data are regressed on these two columns, is the /th periodogram ordinate. 
At the /th Fourier frequency, the sine and cosine run through j cycles in the time period covered by 
the data. If n = 2m, an even number, there are still m periodogram ordinates and j still runs from 1 to 
m, but when j = m the frequency 2nj / n becomes 2nm/(2m) = n and sin(7r/) = 0. Thus for even n, 
the last Fourier frequency has only one degree of freedom associated with it, arising from the cosine 
term, cos(7fr) = (-1)'. only. It does not matter whether a multiple regression using all the Fourier sine 
and cosine columns or m bivariate regressions, one for each j, are run. The columns are all 
orthogonal to each other and the sums of squares (periodogram ordinates) are the same either way. 

PROC SPECTRA calculates periodogram ordinates at all the Fourier frequencies. With the 30 plant 
enzyme measurements there are 15 periodogram ordinates, the last having 1 degree of freedom and 
the others 2 each. Since 2^10/30 = 2iz/3, the Fourier frequency for j = 10 should have periodogram 
ordinate equal to the previously computed model sum of squares, 11933. You might expect the other 
periodogram ordinates to add to 22724, the error sum of squares. However, PROC SPECTRA 
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associates twice the correction term, 2nY 2 = 5096440, with frequency 0 and twice the sum of 
squares at frequency % (when n is even) with that frequency, so one must divide the frequency n 
ordinate by 2 to get its contribution to the error sum of squares from regression. This doubling is 
done here because, after doubling some, division of all ordinates by 2 becomes the same as dividing 
unadjusted numbers by their degrees of freedom. The frequency 0 ordinate is replaced with 0 when 
the option ADJMEAN is used in PROC SPECTRA. These ideas are illustrated here for the plant 
enzyme data: 

PROC SPECTRA DATA=PLANTS OUT=OUT2 COEFF; 

VAR Y; 

RUN; 

DATA OUT2; SET OUT2; SSE = P_01; 

TITLE J=CENTER "ENZYME DATA"; 

IF PERIOD=3 OR PERIOD=. THEN SSE=0; 

IF ROUND (FREQ, .0001) = 3.1416 THEN SSE = .5*P_01; 

RUN; 

PROC PRINT DATA=OUT2; 

SUM SSE; 

RUN; 

The option COEFF in PROC SPECTRA adds the regression coefficients (cos Ol and sin Ol) to the 
data. Looking at the period 3 line of Output 7.3, you see the regression sum of squares 
11933 = P_01, which matches the regression output. The coefficients A = -21.88 and B = 17.79 are 
those that would have been obtained if time t had been labeled as 0,1,..., 29 (as PROC SPECTRA 
does) instead of 1,2,..., 30. Any periodogram ordinate with 2 degrees of freedom can be computed as 
(«/2)(A 2 + B 2 ), where A and B are its Fourier coefficients. You see that 
(30/2)((—21,884) 2 + (17.7941) 2 ) = 11933. (See Output 7.3.) 


Output 7.3 

OUT Data Set 
from PROC 
SPECTRA 


Enzyme activity 
(detrended) 


Obs 

FREQ 

PERIOD 

C0S_01 

o 

1 

z 

1—1 

CD 

o 

1 

CL 

SSE 

1 

0.00000 


582.892 

0.0000 

5096440.43 

0.00 

2 

0.20944 

30.0000 

5.086 

6.3073 

984.78 

984.78 

3 

0.41888 

15.0000 

1 .249 

6.6450 

685.74 

685.74 

4 

0.62832 

10.0000 

-6.792 

-11.9721 

2841.87 

2841.87 

5 

0.83776 

7.5000 

-1.380 

-9.6111 

1414.15 

1414.15 

6 

1.04720 

6.0000 

-10.035 

-7.5685 

2369.85 

2369.85 

7 

1.25664 

5.0000 

0.483 

-9.3467 

1313.91 

1313.91 

8 

1.46608 

4.2857 

4.288 

-1.0588 

292.58 

292.58 

9 

1 .67552 

3.7500 

17.210 

-2.7958 

4560.07 

4560.07 

10 

1.88496 

3.3333 

-6.477 

-0.1780 

629.66 

629.66 

11 

2.09440 

3.0000 

-21.884 

17.7941 

11932.89 

0.00 

12 

2.30383 

2.7273 

-5.644 

-2.7635 

592.35 

592.35 

13 

2.51327 

2.5000 

6.760 

13.8749 

3573.13 

3573.13 

14 

2.72271 

2.3077 

-9.830 

-10.0278 

2957.94 

2957.94 

15 

2.93215 

2.1429 

-0.022 

-5.4249 

441.45 

441.45 

16 

3.14159 

2.0000 

2.973 

0.0000 

132.60 

66.30 


22723.78 
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PROC SPECTRA automatically creates the column FREQ of Fourier frequencies equally spaced in 
the interval 0 to n and the column PERIOD of corresponding periods. It is customary to plot the 
periodogram versus frequency or period, omitting frequency 0. 


Output 7.4 

Periodogram 
with a Single 
Important 
Frequency 


ENZYME PERIODOGRAM 


Ordinate 



Output 7.4 shows the unusually large ordinate 11933 at the anticipated frequency of one cycle per 
12 hours—that is, one cycle per 3 observations. The researcher was specifically looking for such a 
cycle and took sufficient observations to make the frequency of interest a Fourier frequency. If the 
important frequency is not a Fourier frequency, the periodogram ordinates with frequencies near the 
important one will be large. Of course, by creating their own sine and cosine columns, researchers 
can always investigate any frequency using regression. The beauty of the Fourier frequencies is the 
orthogonality of the resulting collection of regression columns (sine and cosine functions). 


7.4 Testing for White Noise 

For a normal white noise series with variance a 2 , the periodogram ordinates are independent and, 
when divided by a 2 , have chi-square distributions with 2 degrees of freedom (df). These properties 
lead to tests of the white noise null hypothesis. 

You are justified in using an F test for the single sinusoid plus white noise model when the 
appropriate to is known in advance, as in Section 7.2. You would not be justified in testing the 
largest observed ordinate (just because it is the largest) with F. If you test for a period 3 component 
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in multiple sets of white noise data (your null hypothesis), the F test statistic will have an F 
distribution. However, if you always test the largest ordinate whether or not it occurs at period 3, 
then this new F statistic will never be less than the F for period 3 and will usually be larger. Clearly 
this new “F” statistic cannot have the same F distribution. 

Fisher computed the distribution for the largest periodogram ordinate divided by the mean of all the 2 
df ordinates under the white noise null hypothesis. In the plant enzyme data, omission of the 1 df 
ordinate 132.6 gives Fisher’s kappa test statistic 11933/[(22723.8 -132.6/2 +11933)/14] = 4.83. 

Fuller (1996) discusses this test along with the cumulative periodogram test. The latter uses C ki 
which is the ratio of the sum of the first k periodogram ordinates to the sum of all the ordinates (again 
dropping any 1 df ordinate). The set of these C k should behave like an ordered sample from a 
uniform distribution if the data are white noise. Therefore a standard distributional test, like those in 
PROC UNIVARIATE, can be applied to these cumulative C k ratios, resulting in a test of the white 
noise null hypothesis. Traditionally the Kolmogorov-Smirnov test is applied. (See Fuller, page 363, 
for more details.) 

Interpolating in Fuller’s table of critical values for Fisher’s kappa with 14 ordinates gives 4.385 as 
the 10% and 4.877 as the 5% critical value. Our value 4.83 is significant at 10% but not quite at 5%. 
Therefore, if you were just searching for a large ordinate rather than focusing from the start on a 12- 
hour cycle, your evidence for a 12-hour cycle would be nowhere near as impressive. This illustrates 
the increase in statistical power that can be obtained when you know something about your subject 
matter. You obtain both white noise tests using the WHITETEST option, as shown in Output 7.5. 

PROC SPECTRA DATA= PLANTS WHITETEST; 

VAR Y; 

RUN; 


Output 7.5 

Periodogram- 
Based White 
Noise Tests 


The SPECTRA Procedure 

Test for White 

Noise for Variable Y 

M-1 

14 

Max(P(*) 

) 11932.89 

Sum(P(*) 

) 34590.36 

Fisher's Kappa: (M 

-1)*Max(P(*))/Sum(P (*)) 

Kappa 

4.829682 

Bartlett's Kolmogorov-Smirnov Statistic: 

Maximum absolute difference of the standardized 
partial sums of the periodogram and the CDF of a 

uniform(0,1) 

random variable. 

Test Statistic 

0.255984 


For 14 periodogram ordinates, tables of the Kolmogorov-Smirnov (K-S) statistic indicate that a value 
larger than about 0.36 would be needed for significance at the 5% level so that 0.256 is not big 
enough. Fisher’s test is designed to detect a single sinusoid buried in white noise and so would be 
expected to be more powerful under the model proposed here than the K-S test, which is designed to 
have some power against any departure from white noise. 
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7.5 Harmonic Frequencies 

Just because a function is periodic does not necessarily mean it is a pure sinusoid. For example, the 
sum of a sinusoid of period k and another of period k 12 is a periodic function of period k but is not 
expressible as a single sinusoid. On the other hand, any periodic function of period k defined on the 
integers can be represented as the sum of sinusoids of period k, k 12, k / 3, etc. For a fundamental 
period k , periods k / j for j = 2,3,... are called “harmonics.” Harmonics affect the wave shape but 
not the period. A period of 2 is the shortest period detectable in a periodogram, and its associated 
frequency, k, is sometimes called the Nyquist frequency. Thus the plant enzyme measurements were 
not taken frequently enough to investigate harmonics of the fundamental frequency 2k/3 (period 3). 
Even the first harmonic has period 3/2 < 2 and frequency 47i/3, which exceeds the Nyquist 
frequency k . 

To further illustrate the idea of harmonics, imagine n = 36 monthly observations where there is a 
fundamental frequency 2^/12 and possibly contributions from the harmonic frequencies 2(27:)/12 
and 3(27i)/12 plus white noise. To fit the model you create three sine and three cosine columns. The 
sine column for the fundamental frequency would have tth entry sin(27r/ /12) and would go through 
3 cycles in 36 observations. Now look at Output 7.6. 


Output 7.6 

Fundamental 
and Harmonic 
Sinusoids 
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Output 7.6 is a schematic representation of the regression X matrix just described and is interpreted 
as follows. On the left, a vertical column of dots represents the intercept column, a column of Is. 

Just to its right is a wave that represents cos(2jtf/12), and to its right is another wave representing 
sin(27T/ /12). Run your finger down one of these two waves. Your finger cycles between one unit left 
and one unit right of the wave center line. There are three cycles in each of these two columns. 
Writing the deviations of dots from centers as numbers supplies the entries of the corresponding 
column of the X matrix. The two waves, or columns of X, currently under discussion will have 
regression coefficients A 1; Bj. By proper choice of these, the regression will exactly fit any sinusoid 
of frequency 2nt /12 regardless of its amplitude and phase. 

Similar comments apply to the other two pairs of waves, but note that, as you run your finger down 
any of these, the left-to-right oscillation is faster and thus there are more cycles: 36/6 = 6 for the 
middle pair and 36/4 = 9 for the rightmost pair, where 12/2 = 6 and 12/3 = 4 are the periods 
corresponding to the two harmonic frequencies. Three more pairs of columns, with periodicities 
12/4 = 3, 12/5, and 12/6 = 2, fill out a full set of harmonics for a period 12 function measured at 
integer time points. They would add 6 more columns for a total of 12 waves, seeming to contradict 
the fact that a period 12 function has 11, not 12, degrees of freedom. However, at period 12/6 = 2, 
the sine column becomes sin(2jrf/2) = sin(7r/ ) = 0 for all t. Such a column of Os would, of course, be 
omitted, leaving 11 columns (11 degrees of freedom) plus an intercept column associated with the 
period 12 function. If 36 consecutive observations from any period 12 function were regressed on 
this 12 column X matrix, the fit would be perfect at the observed points but would not necessarily 
interpolate well between them. A perfect fit at the observation times would result even if the 
sequence Y t were repeated sets of six 1 s followed by six -Is. The fitted values would exactly match 
the observed -1,1 pattern at integer values of t, but interpolated values, say, at time t = 5.9, would 
not be restricted to -1 or 1. One might envision the harmonics as fine-tuning the wave shape as you 
move up through the higher harmonic frequencies (shorter period fluctuations). This motivates the 
statistical problem of separating the frequencies that contribute to the true process from those that are 
fitting just random noise so that a good picture of the wave shape results. Periodograms and 
associated tests are useful here. 

The following outputs are generated from a sinusoid of period k = 12 plus another at the first 
harmonic, period 12/2 = 6. Each sinusoid is the sum of a sine and cosine component, thus allowing 
an arbitrary phase angle. For interpolation purposes, sine and cosine terms are generated for t in 
increments of 0.1, but Y exists only at integer t. 
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Output 7.7 shows three sets of fitted values. The sine and cosine at the fundamental frequency 
tOj = 2 jt/ 12 are used to produce the fitted values on the left side. These fitted values do not capture 
the double peak in each interval of 12 time points, and they miss the low and high extremes. 
Including the first harmonic co 2 = 2(2 jt/ 12) gives a better fit and gives an idea of what the data- 
generating function looks like. The fitted values on the right side are those coming from the 
fundamental and all harmonic frequencies j(2n/\2) for j = 1.2.....6.omitting the sine at / = 6. The 
minor wiggles there are due to the frequencies with j > 2 . Adding all those extra parameters does 
not seem to have produced any useful new features in the fitted values. From PROC REG (see 
Output 7.9), the F test 1.53 O for frequencies with j = 3,4,. ..,6 is not significant, and the Type 1 
sums of squares for j = 1 and 2 are large enough that neither the j = 1 nor j = 2 frequencies can be 
omitted. Recall that you would not eliminate just a sine or cosine; they are treated as pairs. 
Rearrangement of terms or deletion of some terms would not affect the sums of squares here because 
the sine and cosine columns correspond to Fourier frequencies, so they are orthogonal to each other. 

The following PROC SPECTRA code is used to generate Output 7.8 and Output 7.10. 

PROC SPECTRA DATA=COMPRESS P S ADJMEAN OUT=OUTSPECTRA; 

VAR Y; 

WEIGHTS 1234321; 

RUN; 
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Output 7.8 

Periodogram 
with Two 
Independent 
Frequencies 


PERIODOGRAM, PERIOD 12 PLUS 1 HARMONIC 

Periodogram of Y 

600 - 

500 

400 

300 

200 

100 

0 ~T= * * **| , 

0 12 3 4 

Frequency from 0 to PI 


The periodogram, shown in Output 7.8, makes it quite clear that there are two dominant frequencies, 
27i/12 and its first harmonic, 4m/12. The last few lines of the program deliver a smoothed version 
of the periodogram, shown in Output 7.10, that will be discussed in Section 7.9. Smoothing is not 
helpful in this particular example. 


Output 7.9 

Regression 
Estimates and 
F Test 





Parameter 

Estimates 





Parameter 

Standard 




Variable DF 

Estimate 

Error 

t Value 

Pr > |t| 

Type I SS 

Intercept 1 

10.13786 

0.14369 

70.55 

<.0001 

3699.94503 

si 

1 

5.33380 

0.20321 

26.25 

<.0001 

512.09042 

cl 

1 

0.10205 

0.20321 

0.50 

0.6201 

0.18745 

s2 

1 

3.90502 

0.20321 

19.22 

<.0001 

274.48586 

c2 

1 

1.84027 

0.20321 

9.06 

<.0001 

60.95867 

S3 

1 

-0.33017 

0.20321 

-1.62 

0.1173 

1.96220 

s4 

1 

-0.41438 

0.20321 

-2.04 

0.0526 

3.09082 

s5 

1 

-0.17587 

0.20321 

-0.87 

0.3954 

0.55672 

c3 

1 

-0.27112 

0.20321 

-1.33 

0.1947 

1.32314 

c4 

1 

-0.16725 

0.20321 

-0.82 

0.4186 

0.50351 

c5 

1 

-0.02326 

0.20321 

-0.11 

0.9098 

0.00974 

c6 

1 

-0.11841 

0.14369 

-0.82 

0.4180 

0.50473 


Test Harmonics 

i Results for Dependent Variable 

Y 





Mean 




Source 


DF 

Square 

F Value 

Pr > F 


Numerator 

7 

1 .13584 

©1.53 

0.2054 


Denominator 

24 

0.74330 







334 SAS for Forecasting Time Series 


Output 7.10 

Smoothed 

Periodogram 


TRIANGULAR WEIGHTS, PERIOD 12 PLUS 1 HARMONIC 

Spectral Density of Y 



Frequency from 0 to PI 


7.6 Extremely Fast Fluctuations and Aliasing 

Suppose a series actually has a frequency larger (faster fluctuations) than the Nyquist frequency n 
radians per observation—for example, 4%/3>%. Imagine a wheel with a dot on its edge, and an 
observer who looks at the wheel each second. If the wheel rotates clockwise 47t / 3 radians per 
second, at the first observation, the dot will now be 2 %/3 radians counterclockwise —i.e., -2%/3 
radians—from its previous position, and similarly for subsequent observations. Based on the dot’s 
position, the observer only knows that the frequency of rotation is -2n/3 + 2 nj for some integer j. 
These frequencies are all said to be aliased with -2%/3 where this frequency was selected because it 
is in the interval [- 71 , 71 ]. Another alias will be seen to be 2% 13, as though the observer had moved to 
the other side of the wheel. 
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Because Acos(coi) + Bsin(cof) = Acos(-coi) - Bsin(-CGi), it is not possible to distinguish a cycle of 
frequency to from one of -to using the periodogram. Thus it is sufficient and customary to compute 
periodogram ordinates at the Fourier frequencies 2njln with j = 1,2,...,m so that 0 < 2nj In < n. 
Recall that the number of periodogram ordinates m is either (n - 1) / 2 if n is odd or n/2 if n is even. 

Imagine a number line with reference points at nj for all integers j, positive, negative, and zero. 
Folding that line back and forth in accordion fashion at these reference points maps the whole line 
into the interval [0,7i]. The set of points that map into any to are its aliases. For that reason, the 
Nyquist frequency n is also referred to as the folding frequency. The reason that this frequency has 
names instead of always being called n is that some people prefer radians or cycles per second, per 
hour, per day, etc., rather than radians per observations as a unit of measure. If observations are taken 
every 15 minutes, the Nyquist frequency n radians per observation would convert to 4k radians per 
hour, or 2 cycles per hour. In this book, radians per observation and the Nyquist frequency n will be 
the standard. 

When the periodogram is plotted over [0, 71 ] and there appears to be a cycle at a bizarre frequency in 
[0,7i], ask yourself if this might be coming from a cycle beyond the Nyquist frequency. 


7.7 The Spectral Density 


Consider three processes: W, = 10 + (5 /3)e t , Y t = 10 + .8(Y, -\Q) + e t , and 

Z t = 10 - ,8(Z r j -10) + e t , where e t ~ A(0,0.36) is white noise. Each process has mean 10 and 

variance 1. 

The spectral density function of a process is defined as f (to) = — V y(h) cos(co/7), where y(/?) 

2 K^ h — m 

is the autocovariance function. The function is symmetric: /(to) = /(-go). For W f the variance is 
y(0) = o 2 = 1 and y(h) = 0 if h is not 0. The spectral density for W ( becomes just 

1 11 G 2 

//(go) = — V y(h)cos(ath) = —y(0)cos(0) = — and is — for a general white noise series 
27I'^ ,, -~ 00 2 71 2 71 2 71 

with variance a 2 . Sometimes the spectral density is plotted over the interval -k < to < k . Since for 

2 ^2 

white noise, f (to) is —, the plot is just a rectangle of height — over an interval of width 2 n. The 
2k 2k 

2 

G 2 

area of the rectangle, 2k — = o', is the variance of the series and this (area = variance) will be true 

2k 

in general of a spectral density whether or not it is plotted as a rectangle. 
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Because the plot of f w ( co) has equal height at each co, it is said that all frequencies contribute 
equally to the variance of W,. This is the same idea as white light, where all frequencies of the light 
spectrum are equally represented, or white noise in acoustics. In other words the time series is 
conceptualized as the sum of sinusoids at various frequencies with white noise having equal 
contributions for all frequencies. In general, then, the interpretation of the spectral density is the 
decomposition of the variance of a process into components at different frequencies. 

An interesting mathematical fact is that if the periodogram is computed for data from any ARMA 
model, the periodogram ordinate at any Fourier frequency 0 < co < n estimates 4 nf (co); that is, 

47t/ (oj) is (approximately) the periodogram ordinate’s expected value. Dividing the periodogram 
ordinate by 4k thus gives an almost unbiased estimate of /(co). If the plot over -k < co < k is 
desired (so that the area under the curve is the variance), use the symmetry of /(co) and plot the 
estimate at both co and -co. For white noise, of course, /(co) estimates the same thing (a 2 / 2k) at 
each co, and so averaging several /(co) values gives an even better estimate. You will see that local 
averaging of estimates often, but not always, improves estimation. Often only the positive frequency 
half of the estimated spectral density is plotted, and it is left to the reader to remember that the 
variance is twice the area of such a plot. 

What do the spectral densities of Y t and Z t look like? Using a little intuition, you would expect the 
positively autocorrelated series Y ( to fluctuate at a slower rate around its mean than does W ( . 
Likewise you would expect the negatively autocorrelated series Z t to fluctuate faster than W ( since, 
for Z ( . positive deviations tend to be followed by negative and negative by positive. The slower 
fluctuation in Y t should show up as longer period waves—that is, higher periodogram ordinates at 
low frequencies. For Z t you'd expect the opposite—large contributions to the variance from 
frequencies near -k or k. 

The three graphs at the top of Output 7.11 show the symmetrized periodograms for W, Y, and Z 
each computed from 1000 simulated values and having each ordinate plotted at the associated co and 
its negative to show the full symmetric spectrum. The behavior is as expected—high values near 
co = 0 indicating low-frequency waves in Y ( . high values near the extreme co s for Z t indicating 
high-frequency fluctuations, and a flat spectrum for W ( . Two other periodograms are shown. The 
first, in the bottom-left corner, is for D ( = Y t - Y t v Because D ( is a moving linear combination of 
Y t values, D ( is referred to as & filtered version of Y ( . Note that if the filter Y t - ,8Y ( , had been 

applied, the filtered series would just be white noise and the spectral density just a horizontal line. It 
is seen that linear filtering of this sort is a way of altering the spectral density of a process. The 
differencing filter has overcompensated for the autocorrelation, depressing the middle (near 0 
frequency) periodogram ordinates of D ( too much so that instead of being level, the periodogram 
dips down to 0 at the middle. 
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In the wide middle panel of Output 7.11, a sinusoid 0.2 sin(2 tt/ / 25 +. 1) has been added to Y ( and 
the first 200 observations from both the original and altered series have been plotted. Because the 
amplitude of the sinusoid is so small, the plot of the altered Y t is nearly indistinguishable from that 
of the original Y t . The same is true of the autocorrelations (not shown). In contrast, the periodogram 
in the middle of the bottom row shows a strong spike at the Fourier frequency 
2n/25 = 40(2^/1000) = 0.2513 radians, clearly exposing the modification to Y ( . The middle graphs 
in the top (original Y ( ) and bottom rows are identical except for the spikes at frequency ±0.2513. 

The bottom-right graph of Output 7.11 contains plots of three smoothed spectral density estimates 
along with the theoretical spectral densities that they estimate. See Section 7.8 for details about 
smoothing the periodogram to estimate the spectral density. The low horizontal line associated with 
white noise has been discussed already. For autoregressive order 1 series, AR(1) like Y and Z, the 

^2 

theoretical spectral density is f (to) = —/(I + p 2 - 2pcos(co)), where p is the lag 1 autoregressive 

2 71 

coefficient, .8 for Y and -.8 for Z. For a moving average (MA) such as X t =e t - Qe t ,. the spectral 
^2 

density is f, (to) = —(1 + 0 2 - 20 cos(co)). Both the AR and MA spectral densities involve the white 
271 

^2 

noise spectral density —. It is either multiplied (MA) or divided (AR) by a trigonometric function 
2n 

involving the ARMA coefficients. Note that X f is a filtered version of e t . If X ( had been defined in 
terms of a more general time series V ( as X ( = V ( - 0V ( ,, the spectral density of X ( would have 
been similarly related to that of V ( as / x (to) = / v (co)(l + 0 2 - 20cos(co)), where / v (to) is the spectral 
density of V ( . As an example, if Y t has spectral density 

/y (g>) = ^-/(l + P 2 - 2pcos(co)) 

271 

and is filtered to get D r = Y t - 0Y ( ,, then the spectral density of D ( is 

f D (to) = (1 + 0 2 - 20 cos(co))/ r (to) 

2 

= — (1 + 0 2 - 20 cos(co)) /(I + p 2 - 2p cos(co)) 
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Output 7.11 Spectral Graphics (see text) 



















Chapter 7: Spectral Analysis 339 


If D, = Y, -Y, , then 0 = 1, so 

/ D (co) = (2 - 2cos(co))^-/(l + p 2 - 2pcos(co)) 

which is seen to be 0 at frequency co = 0. This gives some insight into the behavior of the D f 
periodogram displayed in the bottom-left corner of Output 7.11. Filtering affects different 
frequencies in different ways. The multiplier associated with the filter, such as (1 + 0 2 - 20cos(co)) in 
the examples above, is sometimes called the squared gain of the filter in that amplitudes of some 
waves get increased (gain >1) and some get reduced (gain < 1). Designing filters to amplify certain 
frequencies and reduce or eliminate others has been studied in some fields. The term squared gain is 
used because the spectral density decomposes the variance, not the standard deviation, into frequency 
components. 


7.8 Some Mathematical Detail (Optional Reading) 


The spectral densities for general ARMA processes can be defined in terms of complex exponentials 
e uo = cos(co) + /sin(co). Here i represents an imaginary number whose square is -1. Although that 
concept may be hard to grasp, calculations done with such terms often result ultimately in 
expressions not involving i, so the use of i along the way is a convenient mechanism for calculation 
of quantities that ultimately do not involve imaginary numbers. 

Using the backshift operator B, the ARMA(/>,g) model is expressed as 


(1 -ctjB-a ; ,B i ')Y r = (1-0 jB-0,B>, 


You now understand that these expressions in the backshift can be correctly referred to as filters. 
Replace B ' with e K0J , getting A(co) = (1 - a x e im - a 2 e uo2 - a p e ,cop ) on the autoregressive and 

M(co) = (1 - 0,6'"'’ - Q 2 e m2 - 0e ' mi ') on the moving average side. The complex polynomials A(co) 

and M(co) have corresponding complex conjugate expressions A* (co) and M* (co) obtained by 
replacing e uo = cos(co) + / sin(co) everywhere with e "" = cos(co)-/sin(co). Start with the spectral 

density of e t , which is —. The spectral density for the ARMA(/;,c/) process Y t becomes 
2 71 


/ Y ( co) 


o 2 M(oo)M'(oo) 

2 n A(co)A*(co) 


When a complex expression is multiplied by its complex conjugate, the 


product involves only real numbers and is positive. 
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If there are unit roots on the autoregressive side, the denominator, A(co) A* (to), will be zero for some 
to and the theoretical expression for f Y (co) will be undefined there. Of course such a process does 
not have a covariance function y(h) that is a function of h only, so the spectral density 


y(h) cos (co/ 7 ) cannot exist either. Despite the fact that / (co) does not exist for unit 

root autoregressions, the periodogram can still be computed. Akdi and Dickey (1997) and Evans 
(1998) discuss normalization and distributional properties for the periodogram in this situation. 
Although they find the expected overall gross behavior (extremely large ordinates near frequency 0), 
they also find some interesting distributional departures from the stationary case. Unit roots on the 
moving average side are not a problem; they simply cause / Y (co) to be 0 at some co values. An 


example of this is D ( . whose periodogram is shown in the bottom-left corner of Output 7.11 and 
whose spectral density is 0 at co = 0. 


7.9 Estimating the Spectrum: The Smoothed Periodogram 

The graph in the bottom-right corner of Output 7.11 contains theoretical spectral densities for W ( . 
Y ( . and Z ( as well as estimates derived from the periodogram plotted symmetrically over the full 
interval -n < co < n. These estimates are derived by locally smoothing the periodogram. Smoothed 
estimates are local weighted averages, and in that picture a simple average of 21 ordinates centered at 
the frequency of interest is taken and then divided by 4k. It is seen that these are good estimates of 
the theoretical spectral densities that are overlaid in the plot. The 4k divisor is used so that the area 
under the spectral density curve over -k < co < k will be the series variance. Weighted averages 
concentrated more on the ordinates near the one of interest can also be used. In the PROC SPECTRA 
output data set, the spectral density estimates are named S_01, S_02, etc., and the periodogram 
ordinates are named P_01, P_02, etc., for variables in the order listed in the VAR statement. 

Let i>) ( ) denote the periodogram ordinate at Fourier frequency co ; = 271 //n constructed from n 
observations on some time series Y,. Suppose you issue these statements: 

PROC SPECTRA P S ADJMEAN OUT=OUTSPEC; 

WEIGHTS 1234321; 

VAR X R Y; 

RUN; 

The smoothed spectral density estimate for Y will have variable name S_03 in the data set 
OUTSPEC and for j >2 will be computed as 

+ 2I „(<»,■- 2 ) + 3I >U 1 ) + 4I >U) + 3I >Ui) + 2I b (°W + U^sM 4 "), where the 

divisor 16 is the sum of the numbers in the WEIGHT statement. Modifications are needed for j <4 
and j > m - 3, where m is the number of ordinates. Output 7.10 shows the results of smoothing the 
Output 7.8 periodogram. Note that much of the detail has been lost. 
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From the graphs in Output 7.8 and 7.10, it is seen that the sinusoids are indicated more strongly by 
the unsmoothed P_01 than by the smoothed spectrum S_01. That is because the smoothing spreads 
the effect of the sinusoid into neighboring frequencies where the periodogram concentrates it entirely 
on the true underlying Fourier frequency. On the other hand, when the true spectrum is fairly smooth, 
as with X, Y, Z, and D in Output 7.11, the estimator should be smoothed. This presents a dilemma 
for the researcher who is trying to discover the nature of the true spectrum: the best way to smooth 
the spectrum for inspection is not known without knowing the nature of the true spectrum, in which 
case inspecting its estimate is of no interest. To address this, several graphs are made using different 
degrees of smoothing. The less smooth ones reveal spikes and the more smooth ones reveal the shape 
of the smooth regions of the spectrum. 

Dividing each periodogram ordinate by the corresponding spectral density /(to) results in a set of 
almost independent variables, each with approximately (exactly if the data are normal white noise) a 
chi-square distribution with 2 degrees of freedom, a highly variable distribution. The weights applied 
to produce the spectral density lower the variance while usually introducing a bias. The set of 
weights is called a spectral window , and the effective number of periodogram ordinates involved in 
an average is called the bandwidth of the window. The estimated spectral density approximates a 
weighted average of the true spectral density in an interval surrounding the target frequency rather 
than just at the target frequency. The interval is larger for larger bandwidths and hence the resulting 
potential for bias increased, whereas the variance of the estimate is decreased by increasing the 
bandwidth. 


7.10 Cross-Spectral Analysis 


7.10.1 Interpreting Cross-Spectral Quantities 

Interpreting cross-spectral quantities is closely related to the transfer function model in which an 
output time series, Y, is related to an input time series, X, through the equation 

Y t =Z%_ aj v } X t _ } +p t 

and where rp is a time series independent of the input, X. For the moment, assume q ( = 0. 

For example, let Y and X be related by the transfer function 
Y, - -8Y, , =X, 

Then 

V. = o U) X. 

which is a weighted sum of current and previous inputs. 

Cross-spectral quantities tell you what happens to sinusoidal inputs. In the example, suppose X t is the 
sinusoid 

X, = sin (cot) 
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where 

03 = 271 / 12 

Using trigonometric identities shows that Y f satisfying 
Y , -- 8Y , i = sin (cot) 
must be of the form 

Y t = A sin (cot-B) 

Solving 

A sin (cot - B) - .8A sin (cot - B - co) 

= (Acos(B)-,8Acos(B + co))sin(cot) 

+ (- A sin (B) +. 8 A sin (B + co)) cos (cot) 

= sin(cot) 


you have 

AcosB-.8Acos(B + co) = 1 

and 

-AsinB + .8Asin(B + co) = 0 
The solution is 

tan(B) = .8 sin(©)/(l-.8 cos(©)) = 4/(l-(.4)>/3) = 1.3022 


and 


A = 1/ 


J.6091-.8 ,609l(V3/2)-.793l(l/2) 


1.9828 


The transfer function produces output with amplitude 1.9828 times that of the input; it has the same 
frequency and a phase shift of arctan (1.3022) = 52.5° = .92 radians. These results hold only for 
co = 27i/12. The output for any noiseless linear transfer function is a sinusoid of frequency co when 
the input X is such a sinusoid. Only the amplitude and phase are changed. 

In cross-spectral analysis, using arbitrary input and its associated output, you simultaneously estimate 
the gain and phase at all Fourier frequencies. An intermediate step is the computation of quantities 
called cospectrum and quadrature spectrum. 
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The theoretical cross-spectrum, f xv (to), is the Fourier transform of the cross-covariance function, 
y,y(/?), where 

t„W = e|[x,-e(x,)][y,„-e(y,)]} 

The real part of f xv (to) = c(at) -iq((xt) is the cospectrum, c(to), and the imaginary part gives the 
quadrature spectrum, q(a). In the example 

Y,-.8 Y,_ 1 = X, 

multiply both sides by X f /| and take the expected value. You obtain 
y ,y{ h )-^r„{h-\) = Txr (/7) 
where y xx (h) is the autocorrelation function (ACF) forx. 

Now when y(/?) is absolutely summable, 

f{co) = {2n)- 1 !"_( l{h)e- M ) 

From these last two equations 

PA (r„(*K“ - .8r,(* - !)«■“) 

or 

(2J!)- 1 2" (y, (*)<•-“-,8y, (A 

= ( 21 )" 2" _„(y,,(*)<•-“) 
or 

4(“)- ■«4(»K" = ./.(») 

However, 

e~ m = cos (co)-/ sin (03) 
so 

fxy (“X 1 - - 8 cos (03) + . 8 ; sin (03)) = (03) 
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Multiplying and dividing the left side by the complex conjugate (l-,8cos(co)-.8/ sin (co)), you 
obtain 

f xy (03) = (l - .8cos (03) - .8; sin (03)) / (l.64 - 1.6 cos(co (03) 

You then have the cospectrum of X by Y (that of Y by X is the same) 
c(co) = (co)(l-.8cos(co))/(l.64-1.6cos(co)) 

and the quadrature spectrum of X by Y (that of Y by X is -c/(co)) 

q( to) = / rr (co)|.8sin(co)/[l.64-1.6cos(co)]| 

In Output 7.12 (pp. 348-349) the cospectrum and quadrature spectrum of Y by X along with their 
estimates from PROC SPECTRA are graphed for the case 

X, = -5X m + e, 

7.10.2 Interpreting Cross-Amplitude and Phase Spectra 

The cross-amplitude spectrum is defined as 

A *v(®) = 1-4 H = (c 2 (©) + r (a)) 0 ' 5 

In this example, 

A *y(®) = (l-64-1.6cos(co)) °' 5 /«((») 

The gain is defined as the amplitude divided by the spectral density of X, or 
A *v (g>)//» (®) 

provided f xx (co) A 0. Thus, the gain is the multiplier applied to the sinusoidal component of X at 
frequency co to obtain the amplitude of the frequency co component of Y in a noiseless transfer 
function, in our case (l .64 — 1.6 cos (co)) 

The phase spectrum T V1 . ( 03 ) of X by Y is defined as 
T'vy (co) = arctan [q (co) / c (co)) 
and that of Y by X is arctan (-g(co) / c( co)). 
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This is the phase difference between the output and input at frequency co. In this example, 

(to) = arctan j-.8 sin (co)/[l — .8 cos (co)]J 

These cross-amplitude and phase spectra are graphed along with their estimates from PROC 
SPECTRA in Output 7.12. The graphs explain the effect of the transfer function on a sinusoidal 
input. Its amplitude is changed (a (ro)), and it undergoes a phase shift (t vi . (03)) . The graphs show 

how these changes are a function of frequency (co). The cross-spectrum can be expressed as 
fxy (ra) = A., ("'lexp(/'I',... (03)) 

Transfer function relationships are not perfect (noiseless), so an error series is introduced into the 
model as 

Y, = VjX t _j + 11 , 

where rp is uncorrelated with X. Now, in analogy to the correlation coefficient, the squared 
coherency is defined as 

K (®) = 1-4 H” 7 (4 H4 H) 

This measures the strength of the relationship between X and Y as a function of frequency. The 
spectrum / (03) of rp satisfies 

fn H = fyy H “ 4 M .4 H f„ M = /„, (to) (l - (co)) 

To compute the theoretical coherency for the example, you need assumptions on X and rp Assume 

X t = .5X M + e t 

with var(e ; )=l and q, is white noise with variance 1. Then 

K^, (co) = |l +[1.64-1.6 cos (co)]/[l.25-cos (co)]} 

The true squared coherency and its estimate from PROC SPECTRA for the example are also graphed 

in Output 7.12 (p. 347). 
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7.10.3 PROC SPECTRA Statements 

PROC SPECTRA gives these names to estimates of the cross-spectral quantities for the first two 
variables in the VAR list: 


Cospectrum CS_01_02 

Quadrature Spectrum QS_01_02 

Cross-Amplitude Spectrum A_01_02 

Phase Spectrum PH_01_02 

S quared Coherency K_01 _02 


PROC SPECTRA options for cross-spectral analysis are as follows: 

PROC SPECTRA DATA=IN 0UT=01 COEF P S 
CROSS A K PH WHITETEST ADJMEAN; 

VAR Y1 Y2; 

WEIGHTS 11111; 

RUN; 


CROSS indicates that cross-spectral analysis is to be done. It produces the cospectrum C_01_02 and 
the quadrature spectrum Q_01_02 when used in conjunction with S. CROSS produces the real part 
RP_01_02 and the imaginary part IP_01_02 of the cross-periodogram when used in conjunction with 
P. Thus, RP and IP are unweighted estimates, and C and Q are weighted and normalized estimates of 
the cospectrum and quadrature spectrum. A, K, and PH request, respectively, estimation of cross¬ 
amplitude, squared coherency, and phase spectra (CROSS must be specified also). Weighting is 
necessary to obtain a valid estimate of the squared coherency. 

Consider the following 512 observations Y f generated from the model 
V t = ,8V, ! + X t (the noiseless transfer function) 


and 


Y, =Y t + r|, (adding a noise tenn) 
where X ; is an autoregressive (AR) process 
X t = .5X m + e t 

with variance 1.3333 and where rp is white noise with variance 1. 

The following SAS code produces appropriate spectral estimates: 

PROC SPECTRA DATA=A 0UT=000 P S CROSS A K PH; 
WEIGHTS 11111111111; 

VAR Y X; 

RUN; 


Plots of estimated and true spectra are overlaid in Output 7.12. 
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Output 7.12 

Plots of 
Estimated 
and True 
Spectra 
(continued) 
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Output 7.12 

Plots of 
Estimated 
and True 
Spectra 
(continued) 



Although the data are artificial, think of X and Y as representing furnace and room temperatures in a 
building. The phase spectrum shows that long-term fluctuations (to near zero) and short-term 
fluctuations (to near n ) for furnace and room temperatures are nearly in phase. The phase spectrum 
starts at zero and then decreases, indicating that X (the furnace temperature) tends to peak slightly 
before room temperature at intermediate frequencies. This makes sense if the furnace is connected to 
the room by a reasonably long pipe. 

The squared coherency is near one at low frequencies, indicating a strong correlation between room 
temperature and furnace temperature at low frequencies. The squared coherency becomes smaller at 
the higher frequencies in this example. The estimated phase spectrum can vary at high frequencies as 
a result of this low correlation between furnace and room temperatures at high frequencies. Because 
of mixing as the air travels from the furnace to the room, high-frequency oscillations in furnace 
temperatures tend not to be strongly associated with temperature fluctuations in the room. 

The gain, A_01_02/S_02, behaves like the cross-amplitude spectrum A_01_02 for this example. This 
behavior shows that low-frequency fluctuations in the furnace produce high-amplitude fluctuations at 
room temperature, while high-frequency fluctuations produce low-amplitude (small variance) 
fluctuations at room temperature. The transfer function tends to smooth the high-frequency 
fluctuations. Because of mixing in the pipe leading from the furnace to the room, it is not surprising 
that high-frequency (fast oscillation) temperature changes in the furnace are not transferred to the 
room. 
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7.10.4 Cross-Spectral Analysis of the Neuse River Data 

In Chapter 3, “The General ARIMA Model,” the differenced log flow rates of the Neuse River at 
Kinston (Y) and Goldsboro (X) are analyzed with the transfer function model 

(l - 1.24 IB + .29 IB 2 + .117B 3 )x r =(l-.874B)e, 

or 

X, -1.241X,_, + .291X r _ 2 + .117X^3 = e, - .874e,_, 

with 

a; = .0399 

and 

Y, =.495X,_ 1+ .273X,_,+s, 

where 

e, =1.163 e m - .48e r _ 2 + v, - ,888v M 
and v f is a white noise series with 
ct; = .0058 

The spectral quantities discussed above are computed and plotted using the estimated model 
parameters. First, the model-based spectral quantities are developed. Then, the direct estimates (no 
model) of the spectral quantities from PROC SPECTRA are plotted. 

When the models above are used, the spectrum of Goldsboro is 

/^(co) = ((l - ,874e' OJ )(l - ,874e-' OJ ))/((l - 1.24l£? 7r " + ,291e-' 0J +.117e 3 ™) 

(1-1.24 hr™ +.29 hr 2 ™ + .117 w 3m, ))(.0399/(27i)) 

= [l + .874 2 -2(.874)cos(co)]/j[l + 1.241 2 + .291 2 +.117 2 ] 

- 2 cos (to) [1.241 +1.241(291) - 29l(.l 17)] - 2 cos(2cd)[1.241(.117) - .291] 

+ 2cos(3co)[. 117]} [.0399/(2jr)] 

Note that the cross-covariance of Y with X is the same as the cross-covariance of X with 

t H H 

,495X M + ,273X« 
so you obtain 

Txy O') = - 495 Txx 0 - !) + - 273 Txx 0 ~ 2 ) 

Thus, the cross-spectrum is 


fxY (®) = (.495e 


•273 e - 2# “)/ xx (or) 
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The real part (cospectrum) is 

c(cg) = (.495 cos (to) + .273008(20))).// (to) 
and the quadrature spectrum is 

g(co) = (.495 sin (to)+ .273 sin ( 201 )) /^ (to) 

The phase spectrum is 

T/ v (or) = arctan(^(co)/c (cg)) 

The spectrum of Kinston (Y) is 

/yy(®) = (,495c- + ,273c 2 -)(.495c- + .273^)/^ (o) + ./(or) 

where 

( ((l -. 888c-)(l -,888 c-)) 

((l-l 163c- +. 48c 2 -) (l-l. 163c— + .48U 2 -)) 

The squared coherency is simply 

K.xy ( o:) ) = |/xy ( o:, )| ! (fxx (®)/yy ( o:) )) 

Consider this pure delay transfer function model: 

Y,=PX,^ 

Using the Fourier transform, you can show the following relationship: 

/ XY (or)= (y XY (/7)c' ffl;, )= P ( Yxx (A - cy^e-) 
so 

fxY (®) = /xx (®) p (cos (cco) + / sin (cco)) 

Thus, the phase spectrum is 

v F xy (co) = arctan (tan (cco)) = cco 

When you use the ordinates in the plot of the phase spectrum as dependent variable values and 
frequency as the independent variable, a simple linear regression using a few low frequencies gives 
1.34 as an estimate of c. This indicates a lag of 1.34 days between Goldsboro and Kinston. Because 
ARIMA models contain only integer lags, this information appears as two spikes, at lags 1 and 2, in 
the prewhitened cross-correlations. However, with the cross-spectral approach, you are not restricted 
to integer lags. In Output 7.13 the irregular plots are the cross-spectral estimates from PROC 
SPECTRA. These are overlaid on the (smooth) plots computed above from the transfer function 
fitted by PROC ARIMA. 
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Output 7.13 

Overlaying 
the Smoothed 
Model- 
Derived Plots 
from PROC 
ARIMA and 
the Irregular 
PROC 
SPECTRA 
Plots 


MODEL AND DATA SPECTRA 

LOG GOLDSBORO % CHANGE IN FLOW 

LGOLD 



Frequency from 0 to PI 


MODEL AND DATA SPECTRA 

LOG KINSTON % CHANGE IN FLOW 

LKINS 



Frequency from 0 to PI 
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Output 7.13 

Overlaying 
the Smoothed 
Model- 
Derived Plots 
from PROC 
ARIMA and 
the Irregular 
PROC 
SPECTRA 
Plots 

(continued) 


PHASE SPECTRA 

FROM DATA AND MODEL 


PHASE 



COHERENCIES 

LOG % CHANGE KINS FLOW AND GOLD FLOW 


COHER 




354 SAS for Forecasting Time Series 


From one viewpoint, the closeness of the PROC SPECTRA plots to the model-derived plots provides 
a check on the ARIMA transfer function model and estimates. From another viewpoint, the model- 
based spectral plots provide a highly smoothed version of the PROC SPECTRA output. 


7.10.5 Details on Gain, Phase, and Pure Delay 

Suppose X f is a perfect sine wave X ( = asin(coi + S). Now suppose Y t = 3X ( , = 3a sin(coi - to + S). 

Y is also a perfect sine wave. The phase -to + 8 of Y is to radians less than the phase of X, and the 
amplitude of Y is 3 times that of X. You could also say that the phase of X is to radians more than 
the phase of Y and the amplitude of X is 1/3 that of Y. You have seen that the idea of cross-spectral 
analysis is to think of a general pair of series X and Y as each being composed of sinusoidal terms, 
then estimating how the sinusoidal components of Y are related, in terms of amplitude and phase, to 
those of the corresponding sinusoidal component of X. 

With two series, Y and X, there is a phase of Y by X and a phase of X by Y. If Y ( = 3X ( , then Y is 
behind X by 1 time unit; that is, the value of X at time t is a perfect predictor of Y at time t +1. 
Similarly X is ahead of Y by 1 time unit. This program creates X, = e t and Y t = 3X ( ,, so it is an 
example of a simple noiseless transfer function. With e t ~ X(0.1). the spectrum f xx (to) of X is 
/ xx (cb) = 1/(2jt) = 0.1592 at all frequencies to, and Y t has spectrum 9 /(2k) = 1.4324. 


DATA A; 

PI = 4 *ATAN(1) ; 

X=0 ; 

DO T = 1 TO 64; 

Y = 3*X; *Y IS 3 TIMES PREVIOUS X* ; 

X=NORMAL(1827655); 

IF T=64 THEN X=0; 

OUTPUT; 

END; 

RUN; 

PROC SPECTRA DATA=A P S CROSS A K PH OUT=OUTl COEFF; 

VAR X Y; 

RUN; 

PROC PRINT LABEL DATA=0UT1; 

WHERE PERIOD > 12; 

ID PERIOD FREQ; 

RUN; 
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Since no weights were specified, no smoothing has been done. Only a few frequencies are printed 
out. 


Output 7.14 X and Y Series 



Frequency 

Cosine 

Sine 

Cosine 

Sine 



from 0 

T ransform 

Transform Transform 

Transform Periodogram 

Period 

to PI 

of X 

of X 

of Y 

of Y 

of X 

64.0000 

0.09817 

0.16213 

-0.09548 

0.51212 

-0.23739 

1 .13287 

32.0000 

0.19635 

-0.24721 

-0.13649 

-0.64748 

-0.54628 

2.55166 

21.3333 

0.29452 

0.09053 

0.26364 

0.03031 

0.83572 

2.48656 

16.0000 

0.39270 

0.37786 

-0.15790 

1 .22856 

-0.00383 

5.36666 

12.8000 

0.49087 

-0.32669 

-0.20429 

-0.57543 

-1.00251 

4.75075 


Frequency 


Spectral 

Spectral 

Real 

Imag 


from 0 

Periodogram Density 

Density 

Periodogram 

Periodogram 

Period 

to PI 

of Y 

of X 

of Y 

of X by Y 

of X by Y 

64.0000 

0.09817 

10.1959 

0.09015 

0.81136 

3.3823 

0.33312 

32.0000 

0.19635 

22.9650 

0.20305 

1 .82749 

7.5079 

1.49341 

21.3333 

0.29452 

22.3791 

0.19787 

1 .78087 

7.1385 

2.16543 

16.0000 

0.39270 

48.2999 

0.42707 

3.84359 

14.8744 

6.16119 

12.8000 

0.49087 

42.7567 

0.37805 

3.40247 

12.5694 

6.71846 


Frequency 







from 0 

Cospectra 

Quadrature 

Coherency 

** Amplitude 

Phase of 

Period 

to PI 

of X by Y 

of X by Y 

2 of X by Y of X by Y 

' X by Y 

64.0000 

0.09817 

0.26915 

0.02651 

1 

0.27045 

0.09817 

32.0000 

0.19635 

0.59746 

0.11884 

1 

0.60916 

0.19635 

21.3333 

0.29452 

0.56806 

0.17232 

1 

0.59362 

0.29452 

16.0000 

0.39270 

1 .18367 

0.49029 

1 

1.28120 

0.39270 

12.8000 

0.49087 

1 .00024 

0.53464 

1 

1.13416 

0.49087 


It is seen that at period 64, X has a component 

0.16213cos(2n;f/64) - 0.09548sin(27it/64) = 0.188156sin(27it/64 + 2.10302) and Y has a component 

0.51212cos(27if/64) - 0.23739sin(27it/64) = 0.564465sin(27it/64 -2.00486), where 
0.564465/0.188156=3 is the amplitude increase in going from X to Y. The phase shift is 2.10302 - 
2.00486 = 0.09817 radians. Each periodogram ordinate is {nil) times the sum of squares of the two 

coefficients, (64/2)[(0.16213) 2 + (0.09548) 2 ] = 1.13287 for X at period 64, for example. 

Each Y periodogram ordinate is 3 2 times the corresponding X periodogram ordinate. This exact 
relationship would not hold if noise were added to Y. Within the class of ARMA models, the 
periodogram \ n (to) divided by Inf at (where the true spectral density of the process is / (to)) has 
approximately a chi-square distribution with 2 degrees of freedom, a distribution with mean 2. This 
motivates I B (co)/47i as an estimator of /(go) for both Y and X. Each spectral density estimator is the 
corresponding periodogram ordinate divided by 4 jc . For example, 1.13287/(471) = 0.0902 for X at 
period 64. 

In the VAR statement of PROC SPECTRA, the order of variables is X Y, and you see that this 
produces the phase of X by Y, not Y by X. The phase -to + 8 of Y is to radians less than the phase 
5 of X as was shown above. Thus the entries in the phase column are exactly the same as the 
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frequencies. The plot of phase by frequency is a straight line with slope 1, and this slope gives the 
pure delay d for Y ( = CX t d so d=l. Had the variables been listed in the order Y X, -to would have 
appeared as the phase spectrum estimate. 

The slope of the phase plot near the origin gives some idea of the lag relationship between Y and X 
in a transfer function model with or without added noise, as long as the coherency there is reasonably 
strong. The delay need not be an integer, as was illustrated with the river data earlier. The phase plot 
of the generated data that simulated furnace and room temperatures had a negative slope near the 
origin. The room temperature Y is related to lagged furnace temperature X, and with the variables 
listed in the order Y X, the phase of Y by X is produced, giving the negative slope. Had the order 
been X Y, the plot would be reflected about the phase = 0 horizontal line and an initial positive slope 
would have been seen. For the river data, you see that the sites must have been listed in the order 
Goldsboro Kinston in PROC SPECTRA, since the phase slope is positive and Goldsboro (X) is 
upstream from Kinston (Y). 

If Y t = 3X ( ! and if X f has an absolutely summable covariance function y xx (/7), which is the case 
in the current example, then Y also has a covariance function 

Yyy (/0 = e{y,y, + *} 

= 9E{X,_ 1 X,_ 1+ *} 

= 9 Txx( /? ) 


By definition, the theoretical spectral density / xx (or) of X is the Fourier transform of the covariance 

sequence: / xx (to) = —V * e mh y xx (h) and similarly f YY (or) = 9 / xx (or). The absolute 
2n 00 

summability assumption ensures the existence of the theoretical spectral densities. The processes also 
have a cross-covariance function 


y XY (/;) = E{X f Y f+/l } 

= 3E{X f X fl+/l } 

= 3Yxx(^ — 1) 

whose Fourier transform is the cross-spectral density of Y by X: 

1 CO O 00 

/xy(®) = e- M y XY (h) = — X e ”'e Tv , (/' -1) 

h =-oo ^h =-oo 

O 00 

= — Z L c °s(®) - /sin(&>)Je """ 1 Yvv (h -1 ) 

= 3[cosO) - i sin(o))J/ xx (to) 


Writing — V e "” h y XY (h) as c(co) -iq{(A), the real part c(co) is the cospectrum of X 


the coefficient of -i is the quadrature spectrum q( to). In this example c( to) = 3cos(oj) / x 


y(oj) = 3sin(co)/ xx (co). For example, at period 32 you find 3cos(2n:/32) = 2.9424 and 
3sin(27t/32) = 0.5853 . Multiplying these by the estimated X spectral density gives 


by Y and 
(to) and 



Chapter 7: Spectral Analysis 357 


(2.9424)(0.20305) = 0.5974, the estimated cospectrum of X by Y for period 32, and similarly 
(0.5853)(.20305) = 0.1188, the estimated quadrature spectrum of X by Y on the printout. 

The phase and amplitude spectra are transformations of q( to) and c( to) and are often easier to 
interpret. The phase of X by Y is Atan(g(co)/c(co)) = Atan(sin(co)/cos(co)) = to and that of Y by X is 
-to, as would be expected from the previous discussion of phase diagrams. The phase shows you the 
lag relationship between the variables, as has been mentioned several times. For 
/ XY (cg) = 3[cos(co) - /' sin(co)]/ xx (co), the amplitude of the frequency co component is 

^c 2 (oj) + c/ 2 (oj) = A(cg) = 3^/cos 2 (co) + sin 2 (co)/ xx (co) = 3 / xx (co). This is called the amplitude of X 
by Y, and in the printout, each of these entries is the corresponding spectral density of X estimate 
multiplied by 3. The quantity A 2 (co)// xx (co) is the spectral density for that part of Y that is exactly 
related to X, without any added noise. Since Y is related to X by a noiseless transfer function, the 
spectral density of Y should be A 2 (co)/ / xx (co). For example, at period 32 you find 
(0.60916) 2 /(0.20305) = 1.82749. Recall that the quantity A(co)//^ (co) has been referred to earlier 
as the “gain.” It represents the amplitude multiplier for the frequency co component in going from X 
to Y in a model where Y is related to X without noise. In our case the gain is thus 

3^/cos 2 (co) + sin 2 (co) =3. 

A more realistic scenario is that an observed series W, consists of Y ( plus an added noise 
component N, independent of X (and thus Y). Here the phase, amplitude, and gain using W and X 
as data have their same interpretation, but refer to relationships between X and Y—that is, between X 
and the part of W that is a direct transfer function of X. You can think of fluctuations in X over time 
as providing energy that is transferred into Y, such as vibrations in an airplane engine transferred to 
the wing or fuselage. The fluctuations in that object consist of the transferred energy plus 
independent fluctuations such as wind movements while flying. The spectral density / w (co) of W 

will no longer be A 2 (co) / f xx (co) but will be this plus the noise spectrum. In a system with noise, 

then, the quantity A 2 (co)/[/ xx (co)/ ww (co)] provides an R 2 measure as a function of frequency. Its 

symbol is k 1 (co) , and it is called the squared coherency. In a noiseless transfer function, l ik e 

Y t = 3X ( !, the squared coherency between Y and X would be 1 at all frequencies because 

[A 2 (co) / f xx (co)J / / YY (co) = / YY (co) // YY (co) = 1 in that case. This appears in the output; however, in 

the absence of smoothing weights, the squared coherency is really meaningless, as would be an R 2 
of 1 in a simple linear regression with only 2 points. 

This small example without smoothing is presented to show and interpret the cross-spectral 
calculations. In practice, smoothing weights are usually applied so that more accurate estimates can 
be obtained. Another practical problem arises with the phase. The phase is usually computed as the 
angle in [- 71 / 2 , 71 / 2 ] whose tangent is c/(co)/c(co). If a phase angle a little less than tt/ 2 is followed 
by one just a bit bigger than n/2, the interval restriction will cause this second angle to be reported 
as an angle just a little bigger than -n/2. The phase diagram can thus show phases jumping back 
and forth between -n/2 and n/2 when in fact they could be represented as not changing much at 
all. Some practitioners choose to add and subtract multiples of 71 from the phase at selected 
frequencies in order to avoid excessive fluctuations in the plot. 

Fuller (1996) gives formulas for the cross-spectral estimates and confidence intervals for these 
quantities in the case that there are 2d +1 equal smoothing weights. 
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8.1 Introduction 

This chapter deals with the process of forecasting many time series with little intervention by the 
user. The goal is to illustrate a modern automated interface for a collection of forecasting models, 
including many that have been discussed so far. Most models herein, such as damped trend 
exponential smoothing and Winters method, are equivalent to specific ARIMA models. Some of 
these were developed in the literature without using ARIMA ideas and were later recognized as being 
ARIMA models. The examples focus on Web traffic data that accumulate very quickly over time and 
require a demanding warehousing and analytics strategy to automate the process. Analysis of such 
large amounts of data is often referred to as “data mining.” 

In this chapter SAS Web Analytics are used to read the Web traffic data, summarize the information 
for detailed and historical analyses, and define the information into a data warehouse. The SAS Web 
Analytics reports provide important details about your Web traffic—who is visiting your site, how 
long they stay, and what material or pages they are viewing. This information can then be 
accumulated over time to construct a set of metrics that enables you to optimize your e-business 
investment. Results are displayed on the Web and accessed by an Internet browser. 

In addition, the SAS/ETS software Time Series Forecasting System (TSFS) is examined. This system 
provides a menu-driven interface to SAS/ETS and SAS/GRAPH procedures to facilitate quick and 
easy analysis of time series data. 

The HPF (High Performance Forecasting) procedure is used here to provide an automated way to 
generate forecasts for many time series in one step. All parameters associated with the forecast model 
are optimized based on the data. 
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Finally, the chapter uses a scorecard to integrate, distribute, and analyze the information enterprise¬ 
wide to help make the right decisions. This interface helps business users analyze data in new and 
different ways to anticipate business trends and develop hypotheses. They can receive automated 
alerts to early indicators of excellence or poor performance. The interface enables IT (information 
technology) professionals to fully automate and personalize the collection and distribution of 
knowledge across the organization. 

The application presented here is available through the SAS IntelliVisor for Retail service. The 
delivery mechanism is provided through an ASP (application service provider) infrastructure. 


8.2 Forecasting Data Model 

Under the ASP framework, each night we receive customer Web logs after 12:00 AM local time. The 
Web logs are unzipped, placed in a file directory, and analyzed using SAS Web Analytics. The data 
examine key metrics used to describe activity during the 24 hours of e-retailing in a given day. One 
company using this approach is the online retailer the Vermont Country Store. (They provided a 
modified version of their data for illustration here. See www.vermontcountrystore.com.) 



Retail Store | Request Catalogue 


Customer Service | My Account 


Home | About Us | Party Line 


Search for: 


Join Our Preferred 
Customer E-List 


Returning Customer? 
Please Sign In 


aping Basket 
leckout 


Catalogue 
Quick Order 


File Edit View Favorites Tools 


Address | £] http: //www. vermontcountrystore. com/navDef ault. asp?flgCookiesEnabled=TRUE 


Stop Refresh Home Search Favorites Media History 


Discuss 


The Vermont Country Store 


Purveyors of the Practical & Hard to Find 


Greetings: I'm Lyman Orton, Proprietor of The Vermont Country Store, The Orton Family 
Business started by my Dad in 1946. We are famous as purveyors of practical, 
functional, and hard-to find products, a tradition started by my Grandfather who was a 
storekeeper in northern Vermont in 1897. For more information about us including our 
story and my catalogue editorials, click here . To see our customer bill of rights, click 
here . 

■ We support the Orton Family Foundation ■ 

I Invite Your Suggestions: Some of our best items come from your 
suggestions. If you have an item you think would be appropriate for 
us to sell, let me know about it and we will look into getting it. I am e . . Mfl -_, T ~r^ 
eager to hear from you. Email me here. 


click here to place 
a catalogue order 


►hops 

Fall Preview 
Customers'Favorites 
Internet Exclusives 
Brands From The Past 
Made In Vermont 
Outdoor 

PamperYourself 

Problem Solvers 

Accessories 

Apothecary 

The Candy Counter 

Clothing 

Domestics 

Food 


Friday, August 16,2002 

Stocked To The Rafters With Hard-To-Find Goods 


When You 
Become A 
Preferred Customer 


The Orion Family Business 
Since 1946 Weston Vermont. 


•£ Internet 
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The variables and their descriptions are provided in Table 8.1, followed by a listing of some of the 
data. 

Table 8.1 Variables and Descriptions 


Variable 

Description 

date 

SAS Date variable formatted in DATE9. 

revenue 

Revenue (TARGET) 

buyer 

Number of Purchasing Sessions 

dollars_per_purch_session 

Average Order Value 

items_per_purch_session 

Average Items per Purchasing Session 

catalog_quick_purch_perc 

%CQS Buyers 

perc_abandon_carts 

Abandon Carts % 

num_session 

Number of Sessions 

requestcatalog_con 

Number of Catalog Requests 

productsuggestion_pages 

Number of Product Suggestion Pages Viewed 

new_cust_perc 

New/Total Sessions x 100 

purch_perc 

Purchase Response Rate 

new_buy_perc 

New/Total Buyers x 100 


IVSAS 








Pile Edit Vtew 

Tools Solutions Window 

Help 






I ^ f 


3] 

D si 0 


•A B 2J 

* <D & 


11 |y Output - (Untitled) 






jgjxj 

1 





itens_per 


-1 





do 11ars_per 

purch 



date 

revenue 


buyer 

purch_session 

session 



01AUG2002 

$144,268.52 


2,040 

$70.72 

4.10 



02AUG2002 

$152,718.61 


2,024 

$75.45 

4.29 



03AUG2002 

$147.851.66 


1,984 

$74.52 

4.31 


— 1 

04AUG2002 

$182,813.95 


2,448 

$74.68 

4.20 



05AUG2002 

$227,971.48 


2,904 

$78.50 

4.42 




catalog 


perc 






quick purch 

abandon 

nun requestcatalog 



date 

perc 


carts 

session 

conf 



01AUG2002 

21 .57 


9.51 

98,415 

4,475 



02AUG2002 

28.85 


9.61 

91.020 

3,275 



03AUG2002 

26.21 


10.93 

71,610 

2,475 



04AUG2002 

26.47 


10.23 

86,610 

3,850 



05AUG2002 

28.37 


10.69 

110,475 

4,475 




productsuggestion 

new cust 


new buy 



date 

pages 


perc 

purch _perc 

perc 



01AUG2002 

220 


63.31 

2.07 

68.63 



02AUG2002 

190 


63.22 

2.22 

69.17 



03AUG2002 

180 


62.76 

2.77 

71 .37 



04AUG2002 

105 


64.78 

2.83 

74.51 



05AUG2002 

255 


64.35 

2.63 

72.18 


zi 

m M i 

11^) Output - (Untitled) Q Log - (Untitled) 

| m 

chptlO.sas 

| -ijGRAPHI WORK.G5EG.G... | 


_1 C:\Program Files\SAS Institute^ 
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8.3 The Time Series Forecasting System 

Open the TSFS and select the data set to be accessed. The TSFS automatically identifies the Time ID 
variable DATE and recognizes that the data are at daily intervals. Since revenue is the target or main 
response variable of interest, select the graph button to evaluate revenue behavior over time. 


SAS 


File View Tools Solutions Window Help 




Project: ISASUSER.FMSPROJ.PROJ 
Description: lUermont Countrv Store Retail ExamD 1el 
Data Set: IdC DATA.DAILY STATS 09AUG02 


Time ID: DATE 
Interval: [DAY 


Select.. 
Specif y. 


Develop Models 

Fit Models Automatically 

\A 

Produce Forecasts 

© 

Manage Projects 


Ii?l Output - (Untitled) [^] Log - (Untitled) 


Browse.. 
Create.. 

Jdd 


-jj GRAPH 1 WORK.G5E,..||t> Time Series Forec... 


O C:\Program Files\5A5 Instil 


jnjxj 
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Select the Revenue variable and then select the Graph button. 



The Revenue variable shows a decrease in variability over time with some periodic tendencies. This 
is not unusual. Retail sales over the Web tend to show a daily cycle over time. (Again, this graph 
represents a display that does not reflect the true revenue at the Vermont Country Store.) 
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The series looks nonstationary, and examining the autocorrelation plots suggests the need to 
difference. By selecting the p=.05 button you can access the Dickey-Fuller unit root test. This test is 
previously described and fails to reject the null hypothesis of nonstationarity only with four 
augmenting lags. The TSFS employs ordinary unit root tests (1 —4> B ) and unit root tests for the 

seasonal polynomial (1 —4> B ) using k lagged differences as augmenting terms. That is, these are 
factors in an autoregressive polynomial of order k + 1 and H 0 : (|) = 1 is tested. The user should 
always entertain the possibility of fitting a model outside the class of models considered here. For 
example, had the pre-Christmas surge in sales been modeled, say, with a separate mean, the residuals 
might look more stationary. The display below only goes up through 5 augmenting terms. 
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Go back to the main window and request that the TSFS automatically fit models for every series. 
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We’re notified that 12 models will be fit for each series. 
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Notice the TSFS selects a seasonal exponential smoothing model for revenue. The TSFS provides an 
assortment of different seasonal and nonseasonal models and chooses the “best” model based on an 
information criterion that in this case is minimizing the root mean square error. The user has some 
control over the list of potential models and simple features of the data that are used initially to pare 
down the list, in this case, to 12 models that might fit well. 
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• SAS - [Automatic Model Fitting Results] 
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Series Name 

Model Label 
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REVENUE 
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3.778051 xxxxxxx 

PURCH_PERC 

Winters Method -• Multiplicative 

0.97780 xxxxxx 

NEW_BUY_PERC 

Seasonal Exponential Smoothing 

3.56541 xxxxxxx 



A 

<]_ 1 



3 


i* 

3 


s 

m 

& 

* 

Graph 

Stats 


Save 

Print 

Close 

Help 


P) Output - (Untitled) | Q Log - (Untitled) 


J chptlO.sas 


^tj GRAPH 1 WORK.G5E... Li Automatic Model... 


Click on column headings to sort within series 


) C:\Program Files\SAS Instit 


Select the Graph button to see the forecasts, and then select the forecast graph button to see the 
forecasts and confidence intervals. 
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The graph and review of the data and forecasts using the data button suggest the seasonal exponential 
smoothing model does not fit the larger revenue spikes very well, although it does a reasonable job 
overall. Because exponential smoothing is analogous to fitting a unit root model, the typical fast 
spreading prediction intervals are seen as the forecast goes beyond one or two steps. 

You can also go back to the Automatic Fitting Results screen to evaluate the forecasts for each 
series individually. 

The TSFS can also be further automated by using the forecast command and the SAS/AF 
Forecast Application Command Builder. 


8.4 HPF Procedure 

The HPF procedure can forecast millions of time series at a time, with the series organized into 
separate variables or across BY groups. You can use the following forecasting models: 

Smoothing Models: 

□ Simple 

□ Double 

□ Linear 

□ Damped Trend 

□ Seasonal 

□ Winters Method (additive and multiplicative) 

Additionally, transformed versions of these models are provided: 

□ Log 

□ Square Root 

□ Logistic 

□ Box-Cox 

For intermittent time series (series where a large number of values are zero values), you can use 
Croston’s method (Croston 1977). 

All parameters associated with the forecast model are optimized based on the data. The HPF 
procedure writes the time series with extrapolated forecasts, the series summary statistics, the 
forecast confidence limits, the parameter estimates, and the fit statistics to output data sets. 
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The HPF procedure step below examines the application of the automatic forecasting technique to 
the evaluation of seven different forecasting methods described above. The program creates a 
Forecasts data set that contains forecasts for seven periods beyond the end of the input data set 
VC_DATA.DAILY_STATS_09AUG02. The data represent daily values for Revenue, a variable 
describing the total number of purchasing dollars for a given day. The daily variable indicator, date, 
is formatted date9. 

The GPLOT procedure is used to display the actual values, predicted values, and upper and lower 
confidence limits overlaid on the same graph. 
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The Winters additive method seasonal exponential smoothing model fits best based on the 
RMSE statistic, and only the level weight is statistically different from 0. When performing these 
operations in an automatic fashion on many series, it is often found that the models tend to be 
overparameterized. 
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The forecasts for the next seven days are displayed below, in addition to the standard errors and 
upper and lower 95% prediction intervals. The lower 95% confidence interval falls below 0 as you 
extend well beyond the end of the historical data. 
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The statistics of fit for the selected model are given as a reference for model comparison. As noted, 
these are calculated based on the full range of data. A detailed description of these summary statistics 
can be found by consulting the SAS System 9 documentation. 
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A forecast summary shows values for the next seven days, and a sum forecast for the seven-day total 
is displayed at the bottom. 
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The graph below suggests a drop in purchasing sessions in early January. The Winters additive 
method of seasonal exponential smoothing does a nice job of tracking the historical data shown by 
the heavy middle graph line. 
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8.5 Scorecard Development 

Each day the Vermont Country Store is provided with a report called a “scorecard” that examines its 
key metrics (variables of interest). The revenue is denoted Revenue (TARGET). The actual value for 
the day is removed and then forecasted using the HPF procedure. Since the current day’s value is 
removed (9Aug02 in this case), the standard error and forecast estimate are independent of today’s 

observed value. Standardized differences denoted “Difference” (Y, - Y ; ) / ,v y are also displayed for 
each metric. 
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8.6 Business Goal Performance Metrics 

From a retailing business perspective, often you would like the actual values of a metric like Buyer 
Percent to be larger than the Forecast value so that you are doing better than expected. For a metric 
like Error Page Percent, smaller values are preferred. For each metric a directional business 
performance measure is computed for the day. If the preferred direction is greater than the forecast, 
the calculation is 


(1/V2tt f e^' 2 


dt! 2+.5 )*100 


where x = ((Y - Y) / ,v y ) 


Thus the Business performance has a minimum value of 50% (when Y is small). When Y matches 
the prediction, the Business performance statistic has a value of 75%; it increases toward 100% as Y 
gets larger than the prediction. 

When the preferred direction of the business movement is less than the prediction, the Business 
performance measure is calculated as 


(1- (1/V2tt f e- tli2 


dt! 2))* 100 


where x = ((Y -Y)/s^) 


Using this approach, each metric in the table has a Business performance measure. The AUTOREG 
procedure is then applied by regressing the target (Revenue in this example) on the other metrics and 
treating l-pvalues as weight statistics. The sum of products of weight statistics and Business 
Performance measures gives an overall daily mean score as shown in the previous display. 


8.7 Graphical Displays 

You can go to the scorecard table and select each metric to display the predictions and limits in a 
graphical format. In the following display, the scattered black dots represent the observed data, and 
the dots connected by lines represent the predictions. On the target day (9Aug02) the observed value 
is removed, so we designate the forecasts and upper and lower 95% prediction intervals with plus 
signs. Throughout the other historical data, the forecasts and forecast bounds are based on a model 
developed from the full data set that includes 9Aug02. The same is true of the forecasts and bounds 
beyond 9Aug02. 
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If the HPF procedure selects a seasonal model, you will see a display of the daily averages, as shown 
below. By clicking on a given day of the week, you can also see the associated history for that day 
over the past history of the data. 
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The drop in revenue is also displayed in the chart of past Sunday revenues. 
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The scorecard also supports the output from a regression with autocorrelation and the ability to solve 
for inputs one at a time when seeking input values that deliver a specified level of a target. This is 
done using the SOLVE statement in PROC MODEL. By simply selecting the Goal Seeking 
Scorecard, you can find values of the inputs that satisfy increasing values of the target Revenue. An 
example of fitting a model and using it to later solve for values of the inputs is illustrated below. We 
restrict the explanatory variables to Purchasing Sessions, Average Order Value, and Product 
Suggestions to illustrate how the back solution is obtained. 
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The 0 Percent column indicates the current daily settings for the metrics on 09Aug02. Increasing the 
target by 5% would set revenue at $199,905.09. To achieve this goal would require 2769 purchasing 
sessions, assuming all the other inputs remain at their 0 percent level (i.e., the 9Aug01 value). 

It is interesting to note that the number of product suggestions would need to drop to 54.5 to achieve 
this 5% increase. In other words, fewer visitors would be suggesting alternative products to the site 
and would be more apt to purchase the observed products. Based on the regression results, the 
number of product suggestions becomes negative (unreasonable) as revenue increases beyond 5%. 
The display uses metadata (data that characterize positive and negative business directions and 
acceptable ranges, etc. ) that describe reasonable values and set the corresponding negative values to 
missing. The increasing values for purchasing sessions and average order size provide reasonable 
results. 
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8.8 Goal-Seeking Model Development 

The MODEL procedure analyzes models in which the relationships among the variables comprise a 
system of one or more nonlinear equations. The %AR macro can be used to specify models with 
autoregressive error processes similar to the AUTOREG procedure. In this case we are regressing 
revenue on buyer, dollar_per_purch_session, and productsuggestion_pages. 
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Bproc model data=vc_data.daily_stats_09augO2 outmodel=outrtiodel; 
revenue = bO + bl*buyer + b2*doliars_per_purch_session + 
b3*productsuggestion_pages ; 
hax (revenue , 7,,1 5 7); 

fit revenue / outest=outest maxiter^lOOO 
roet hod=roar quar dt; 

run; 

Bdata goal goal2; 

set vc_data. daily_stats_09aug02 (where= (date=' 09aucj02 ' d) keep= 
revenue buyer dollars_per_purch_session productsuggestion_pages 
date); 

output goal; 

revenue = round(1.05*revenue); /* 5% increase */ 
output goal2; 

run; 


Bproc model model=outmodel; 

solve buyer / estdata=outest data=goal2 
out=outsolvel; 

solve dollars_per_purch_session / estdata=outest data=goal2 
out=outsolve2; 

solve productsuggestion_pages / estdata=outest data=goal2 
out=outsolve3; 

run;]_ 
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The SOLVE data set is created to view values of the input variables that satisfy the 5% increase for 
the tai'gct variable Revenue. 
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The output below examines the parameter estimates and test statistics. Lags 1, 5, and 7 for the 
autoregressive errors are statistically different from 0. The signs of the coefficient for purchasing 
sessions and average order value are positive and negative for product suggestions. The R square and 
significant parameters and AR terms suggest a reasonable model. 
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Observation 1 in the SOLVE data set shows the values for current values for the four variables for 
09AUG2002. Using the fitted model with autoregressive errors, observations 2 through 4 
demonstrate the changes in each individual input required to achieve a 5% increase in revenue, 
assuming the other inputs are at their current levels. These match the Goal Seeking Scorecard results. 
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8.9 Summary 

This example illustrates how you can apply automated forecasting techniques in a data mining 
environment. SAS IntelliVisor for Retail through the ASP delivery channel requires the ability to 
construct analytic results quickly in a batch environment without user intervention. The use of a daily 
scorecard allows the consumer to focus on what’s important and how things are changing over time. 
By focusing on a goal-seeking report, you can set goals and determine the changes required to 
produce increasing returns on investment. 
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MODEL statement 7,18 
regression with transformed data 21-26 
silver stocks (example) 114-116 
regression, linear 6-12 
See also autoregression 
logarithmically transformed data 21-26 
with time series errors (example) 164, 
167-178 

with time series errors and unequal 
variances 239-256 
very regular seasonality 13-20 
regression, ordinary least squares (OLS) 

255, 270 

regular seasonality 13-20 
residuals, chi-square check of 79, 173 
RESTRICT statement, STATESPACE 
procedure 287-288, 306 
example, univariate (simulated) 311 
retail sales (example) 13-20, 223-230 
river flow rates (example) 202-212 
cross-spectral analysis 350-354, 356 

s 

SAS/ETS software 2-6 
SBC information criterion 134 
SCAN method 131-133, 137-140 
scorecard development 375-380 
seasonal modeling 143-164 
ACF 143-145 

construction workers (example) 146-152 
highly regular seasonality 13-20 
international airline passengers (example) 
152-164 

modeling with mean 13 
possibilities for 2-3 
regular seasonality 13-20 
seasonality, defined 1 
very regular seasonality 13-20 
Winters exponentially smoothed trend- 
seasonal models 2,371 
X-l 1 seasonal adjustment package 2 
second-order AR processes 39 

Yule-Walker equations for covariances 
42, 292 


shift in response (leading indicators) 166, 
210,354-357 

simulated univariate state space modeling 
307-314 

shocks 

See cointegration 
See impulse response function 
See intervention analysis 
silver stocks (example) 45-48 

models for nonstationary data 114-117 
simple transfer functions 165 

housing starts (example) 179-182 
sinusoidal component modeling 
See spectral analysis 
smoothed periodograms 333-334, 
340-341 ~ 

smoothing, exponential 2 

Winters exponentially smoothed trend- 
seasonal models 2,371 
SOLVE statement, MODEL procedure 
380 

SPECTRA procedure 3, 326-328 
A option 346 
ADJMEAN option 327 
COEFF option 327 
cross-amplitude and phase spectra 
344-345 

CROSS option 346 
cross-spectral analysis 346-350 
cross-spectral analysis (example) 
350-354, 356 

estimating spectrum 340-341 
K option 346 
PH option 346 
VAR statement 355 
WEIGHT statement 340 
WHITETEST option 329 
spectral analysis 323-357 

aliasing and extremely fast fluctuations 
334-335 

cross-spectral analysis 341-357 
cross-spectral analysis (example) 
350-354,356 

estimated (smoothed periodograms) 
333-334, 340-341 
gain, phase, and pure delay 354-357 
harmonic frequencies 330-334 
phase spectra 344-345, 354-357 



Index 397 


plant enzyme activity (example) 

324-326 

spectral density 335-339 
white noise, testing for 328-329 
spectral density 335-339 
spectral window 341 
squared coherency 346-349 
squared gain 339 
state space modeling 283-321 
autocorrelation 314 
block identifiable systems 296 
canonical correlation analysis 305-307 
equivalence with vector ARMA modeling 
294-297 

fur trapping (example) 314-321 
input matrix 287 
multivariate examples 285-294, 
301-302, 314-321 
transition matrix 287 
univariate examples 283-284, 298-301 
univariate examples (simulated) 

307-314 

Yule-Walker equations for covariances 
41-44 

state vectors 284 

See also state space modeling 
determined from covariances 305 
vector ARMA models 294-297 
STATESPACE procedure 2-3, 286, 
290-291, 302-321 
CANCORR option 307 
canonical correlation analysis 305-307 
COVB option 311 
example (simulated) 307-314 
FORM statement 314 
fur trapping (example) 314-321 
ITPRINT option 307 
NOEST option 306 
relationship with other procedures 4-6 
RESTRICT statement 287-288, 306, 

311 

state vectors determined from covariances 
305 

stationarity 44 
ARMA models 55 
nonstationary series 102-103 
nonstationary series, models for 
113-123 ” 


steel and iron exports (example) 90-94 
stock price series, IBM (example) 

105-113 

stock trading volume, American Airlines 
(example) 237-238 

T 

terrorist attack (example) 237-238 
time series, explanatory 3 
time series errors, regression with 164, 
167-178, 239-245 
unequal variances 245-256 
university energy demand (example) 
241-245 

time series forecasting system (TSFS) 
362-368 

time series identification, ARMA models 
estimating autocorrelation functions 56 
79 

estimating autocorrelation functions 
(Y8 series examples) 81-89 
iron and steel exports (example) 90-94 
time series models with sinusoidal 
components 
See spectral analysis 
transition matrix (state space modeling) 
287 

trapping (example) 314-321 
Treasury bill rates (example) 22-26 
trends 1 

detrending data 317-321 
intervention analysis 165,213-223, 
233-238 

linear, moving with differencing 
123-128 

TSFS (time series forecasting system) 
362-368 

u 

ULS (unconditional least squares) method 
34-35, 39 

ARIMA procedure estimation methods 
95-97 

ARMA model forecasting methods 
53-54 

unconditional sum of squares (USS) 

34-36 
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unequal variances and time series errors, 
regression with 245-256 
unit root nonstationarity 103 
cointegration 263-265 
models for nonstationary data 113-123 
U.S. housing starts (example) 179-182 
U.S. iron and steel exports (example) 

90-94 

U.S. Treasury bill rates (example) 22-26 
univariate state space modeling examples 
283-284, 298-301 
fur trapping (example) 314-321 
simulated 307-314 
university energy demand (example) 
241-245 

USS (unconditional sum of squares) 34-36 

V 

VAR models 

See vector autoregression models 
VAR= option, IDENTIFY statement 
(ARIMA) 80 

VAR statement, SPECTRA procedure 355 
VARMAX procedure 271 

Amazon.com stock price (example) 
275-281 

ARIMA procedure 2-3 
COINTEG statement 275 
COINTEST option 275 
relationship with other procedures 4-6 
tests for cointegration 176 
VAR models, calculations 275-281 
vector autoregression (VAR) models 
256-281 

cointegration and unit roots 263-265 
eigenvalues 258-259 
example 265-270 
higher-order models 260-263 
impulse response function 260 
state space models, equivalence to 
294-297 

STATESPACE procedure for 302-321 
VARMAX procedure calculations 275- 
281 

vectors, state 
See state vectors 

Vermont Country Store data (example) 
360-36l" 


business goal performance metrics 376 
scorecard development 375 
TSFS (time series forecasting system) 
362-368 

very regular seasonality 13-20 

w 

WEIGHT statement, SPECTRA procedure 
340 

white noise 27 

testing for 328-329 
WHITETEST option, SPECTRA 
procedure 329 

Winters exponentially smoothed trend- 
seasonal models 2,371 
Wold representation 28 

X 

X-l 1 seasonal adjustment package 2 
Xll procedure 2 
X12 procedure 3 

Y 

Y8 series examples 81-89 
Yule-Walker equations for covariances 
41-44 

state space modeling 292 
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